Best ollama models for coding reddit For coding I had the best experience with Codeqwen models. This method has a marked improvement on code generating abilities of an LLM. 130 votes, 108 comments. The prompt template also doesn't seem to be supported by default in oobabooga so you'll need to add it manually I don't Roleplay but I liked Westlakes model for uncensored creative writing. Also, I've yet to find a model that is "only useful for people who want creative text gen" as every model I've tested that's been good for that has also been good for "standard NLP tasks. Best overall / general use Best for coding Best for RAG Best conversational (chatbot applications) Best uncensored Yeah, exactly. cpp. It can write join codes accurately. With more advanced models you can have a coherent inventory, health points etc. Ollama also works with third-party graphical user interface (GUI) tools. My current and previous MacBooks have had 16GB and I've been fine with it (and run 13b models quite well with Ollama; my current choice is wizard-vicuna-uncensored:13b, but I'm always looking for a better general-purpose uncensored model), but given local models I think I'm going to have to go to whatever will be the maximum RAM available for This uses models in GGML/GGUF format. In the rapidly evolving landscape of software development, Ollama models are emerging as game-changing tools that are revolutionizing how developers approach their craft. 8x7B: BagelMIsteryTour-v2-8x7B, probably the best RP model I've ever ran since it hits a great balance of prose and intelligence. I'd really recommend you play around with 7b models at q4, and try it against a few real-life test cases to see what works. I have tested it with GPT-3. These tasks include: Text generation. As a coding agent, Devstral is text-only and before fine-tuning from Mistral-Small-3. Roleplay. With ollama I can run both these models at decent speed on my phone (galaxy s22 ultra). May 21, 2025 · The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model. Larger quantized models often outperform their smaller non-quantized counterparts. I suggest you to first understand what size of model works for you, then try different model families of similar size (i. On my pc I use codellama-13b with ollama and am downloading 34b to see if it runs at decent speeds. "Please write me a snake game in python" and then you take the code it wrote and run with it. I currently use llama3. dev on VSCode on MacBook M1 Pro. I am now looking to do some testing with open source LLM and would like to know what is the best pre-trained model to use. 1,25 token\s. There is no 7b model that actually is similar in performance. 7b at size) We would like to show you a description here but the site won’t allow us. I think this question should be discussed every month. I use this server to run my automations using Node RED (easy for me because it is visual programming), run a Gotify server, a PLEX media server and an InfluxDB server. That's the way a lot of people use models, but there's various workflows that can GREATLY improve the answer if you take that answer do a little more work on it. Maybe its my settings which do work great on the other models, but it had multiple logical errors, character mixups, and it kept getting my name wrong. This is the kind of behavior I expect out of a 2. I'll update in some time. But alas, I encountered some RAG-related and backup issues. But for fiction I really disliked it, when I tried it yesterday I had a terrible experience. Am I missing something? I've been using magicoder for writing basic SQL stored procedures and it's performed pretty strongly, especially for such a small model. There one generalist model that i sometime use/consult when i cant get result from smaller model. 1 8B for summarization text of all types. 5 and GPT-4. I tried starcoder2:7b for a fairly simple case in python just to get a feel of it, and it generated back whole bunch of C/C++ code with a lot of comments in Chinese, and it kept printing it out like in an infinite loop. Thanks for the answer, I'll try it out more. 1 on English academic benchmarks. It is finetuned from Mistral Small 3. Run locally. So the best thing is the work of a team of models and not just one. Edit: This is the best open-source model that I've tried for SQL queries. Plus, Ollama's uncensored models left something to be desired. Imo codellama-instruct is the best for coding questions. If you allow models to work together on the code base and allow them to criticize each other and suggest improvements to the code, the result will be better, this is if you need the best possible code, but it turns out to be expensive. This comprehensive guide will take you through everything you need to know about selecting and maximizing the potential of Ollama models for your coding journey. q5_k_m. For coding the situation is way easier, as there are just a few coding-tuned model. e. They handle a range of natural language processing (NLP) tasks with ease. Wish it didn't require a beefy PC though. May 2, 2025 · Introduction. Some models are better at following instructions but you’ll find yourself doing 8-9 shots to get something correct. For artists, writers, gamemasters, musicians, programmers, philosophers and scientists alike! The creation of new worlds and new universes has long been a key element of speculative fiction, from the fantasy works of Tolkien and Le Guin, to the science-fiction universes of Delany and Asimov, to the tabletop realm of Gygax and Barker, and beyond. What other models do We would like to show you a description here but the site won’t allow us. I prefer to chat with LLMs in my native language German, in addition to English, and few local models can do that as well as those from Mistral and Cohere. Code mostly on python and Pascal(delphi). just get q8 if your gpu Browse Ollama's library of models. I use the Phi-3 model, that you can install directly in the app. Many folks frequently don't use the best available model because it's not the best for their requirements / preferences (e. Dec 23, 2024 · What are Ollama Models? Ollama models are large language models (LLMs) developed by Ollama. gguf embeddings = all-MiniLM-L6-v2 I have been trying Ollama for a while together with continue. precision, while fp16 does 16. I'm using : Mistral-7B-claude-chat. Hello everyone currently looking for recommendations for a decent model To. These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3. : Llama, Mistral, Phi). IME, the best "all-around" model, for MY applications and use cases (which are fairly technical and humorless), has been dolphin-Mistral. Key Features: In the rapidly evolving landscape of software development, Ollama models are emerging as game-changing tools that are revolutionizing how developers approach their craft. For coding related task that is not actual code, like best strategie to solve a probleme and such : TheBloke/tulu-2-dpo-70B-GGUF I never go all the way to TheBloke/goliath-120b-GGUF, but its on standby. I see specific models are for specific but most models do respond well to pretty much anything. Large Language Models (LLMs) have profoundly reshaped the software development landscape by May 2025. I don't have too much experience with them, since the bigger models that I use tend to be pretty good at answering coding-related questions, even if they were not exclusively trained on coding problems. I am not a coder but they helped me write a small python program for my use case. Best local coding LLM setup for 16GB VRAM : LocalLLaMA - reddit I have a fine tuned model on csharp source code, that appears to "understand" questions about csharp solutions fairly well. This blog explores the top Ollama models that developers and programmers can use to Asking the model a question in just 1 go. The developer treats local models as first class citizens. Hey, fellow M1 16gb user! I personally use the following models: OpenHermes Neural 7B q4: 4. Translation. Mar 10, 2025 · I run Ollama on my desktop with 64GB ram and an RTX4080. You should check out continue. I have tried codellama:7b codegemma:2b llama2:8b I got best tab completion results with codellama model, while best code implementation suggestion in chat with llama3 for Java. After searching for this question, the newest post on this question was 5 months ago, so I'm looking for an updated answer. " For my own personal use, Command R+ is the best local model since Mixtral 8x7B, and I've been using either since their release. I have a 3080Ti 12GB so chances are 34b is too big but 13b runs incredibly quickly through ollama. task(s), language(s), latency, throughput, costs, hardware, etc) Im new to LLMs and finally setup my own lab using Ollama. When you visit the Ollama Library at ollama. Also does it make sense to run these models locally when I can just access gpt3. q4 is cut at 4 numbers behind . We would like to show you a description here but the site won’t allow us. 1: 637mb. I guess you can try to offload 18 layers on GPU and keep even more spare RAM for yourself. Opus is most likely in 100b+(probably 300b?) parameters so its going to be far better then a 7b model even specialized at coding. g. Llama 3 70b Q5_K_M GGUF on RAM + VRAM. OLMo 2 is a new family of 7B and 13B models trained on up to 5T tokens. you see the endings there ? thats the quality of precision of the models. 1, therefore it has a long context window of up to 128k tokens. . Weak models will mess up little details or even mess up the plot. " After a lot of failure and disappointments with running Autogen with local models, I tried the rising star of agent frameworks, CrewAI. how do i combine snippets ollama provides into 1 long block of code aswell? is there something like an interface, model, project i should be using as a ollama coding buddy? feel free to add onto this if you wish too. You ask the model to narrate events about a simulated virtual world, virtual characters and interact with them. i mostly recommend q6 for performance speed ratio, sadly most models on ollama dont come q6. I'm trying PipableAI/pip-sql-1. Sometimes I need to negotiate with it though to get the best output. Command R+ has replaced Mixtral as my daily driver. Though that model is to verbose for instructions or tasks it's really a writing model only in the testing I did (limited I admit). true. This you have to keep switching up the model because their training data does not 100% overlap Also people have different tastes for what they want the LLM to do to their code on the scale between “do too little” and “do too much” The easier way is to install the 'Private AI' app. If applicable, please separate out your best models by use case. You should have no issue running models up to 120b with that much RAM, but large models will be incredibly slow (like 10+ minutes per response) running on CPU only. Dec 2, 2024 · Ollama offers a range of models tailored to diverse programming needs, from code generation to image reasoning. But they are all generalist models. It uses self-reflection to reiterate on it's own output and decide if it needs to refine the answer. To narrow down your options, you can sort this list using different parameters: Featured: This sorting option showcases the models recommended by the Ollama team as the best choices for most users. TinyLlama-1. You might look into mixtral too as it's generally great at everything, including coding, but I'm not done with evaluating it yet for my domains. models get trained at 32 most times and then quantized down so people can use them locally. Recommend sticking to 13b models unless you're incredibly patient. but otherwise pretty good. The best ones for me so far are: deepseek-coder, oobabooga_CodeBooga and phind-codellama (the biggest you can run). Following is the config I used They are good at answering Q/A. Can ollama help me in some ways or do the heavy lifting and what coding languages or engines would i have to use along side ollama. Key Features: We would like to show you a description here but the site won’t allow us. Genuinely this, not in a shitty "ugh, we get asked this so much" way but in a "keeping a thread on the current recommended models at different sizes that gets refreshed frequently is a good idea, because there's just so much and it's very hard to follow sometimes for some people. That said you can check out codeqwen 7b or the wavecoder models from microsoft(6. just get q8 if your gpu May 21, 2025 · The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model. What's the current best general use model that will work with a RTX 3060 12GB VRAM and 16GB system RAM? This has not been my experience. It is a multi-agent framework based on LangChain and utilities LangChain's recently added support for Ollama's JSON mode for reliable function calling. So you have to wrangle a little bit with it using the user role. Maybe you should evaluate models based on cloud instance APIs and see what works well in the closed big model space, then down into the 70B, 33B, 13B, 7B range open models and see if anything you can run locally is satisfactory in performance. Recently I played a bit with LLMs, specifcally exploring ways of running the models locally and building prompts using LangChain. so you only get 80-90% same output in q4 vs f16. I also recommend openwebui as a front end, I really like its prompt templating that allow to directly use the clipboard. I think it ultimately boils down to wizardcoder-34B finetune of llama and magicoder-6. Open WebUI + Ollama Backend: Initially, I set up Open WebUI (via Pinokio) with Ollama as the backend (installed via winget). 7B but what about highly performant models like smaug-72B? Intending to use the llm with code-llama on nvim. 7B model not a 13B llama model. Sorting the Model List. At least as of right now, I think what models people are actually using while coding is often more informative. The app is free, except for the larger models, which you probably don't want to run on a phone anyway. Evolving beyond basic code completion, these sophisticated AI co-pilots now debug complex code, refactor entire codebases, generate comprehensive documentation, translate between programming languages, and even assist in high-level system design. 5 on the web or even a few trial runs of gpt4? We would like to show you a description here but the site won’t allow us. I have a 4090 and an I7 with 64gb ram ddr4. 1 the vision encoder was removed. So far, they all seem the same regarding code generation. 3b at the moment. 37gb . Some models are better than others (I think what makes chatgpt really good is that most likely they have training data that is good at following directions). As a result ended up coding a small recommendation system, powered with Llama3-7b model, which suggests topics to read on HackerNews. For example there are 2 coding models (which is what i plan to use my LLM for) and the Llama 2 model. The problem is Ollama doesnt allow any other roles other than user, system and assistant while Nous's Hermes 2 model was finetuned on another role called <tool>. LM Studio: Then I switched gears to LM Studio, which boasts an impressive array of uncensored I am a hobbyist with very little coding skills. ai, you will be greeted with a comprehensive list of available models. Will occupy about 53GB of RAM and 8GB of VRAM with 9 offloaded layers using llama. These models learn from huge datasets of text and code. Specifically Ollama because that's the easiest way to build with LLMs right now. I have been running a Contabo ubuntu VPS server for many years. dev and ollama, those are easy enough to deploy for vscode code assistant. and give you a virtual narrated world where you can do anything. fihurk rlgmb dul mjhfepq gknzjz cyyj klh tgwz rpen gcz

Best ollama models for coding reddit. Wish it didn't require a beefy PC though.

Best ollama models for coding reddit. So far, they all seem the same regarding code generation.