Data sensitivity is a serious issue. Many organizations, especially medical establishments, prioritize data privacy above everything else. That is true, especially in the age of AI, where popular AI chat services like ChatGPT, Claude, and others use the data you put in them to train their AI models.
Professionals like doctors, lawyers, and business consultants frequently require AI support but can't afford to compromise confidentiality by uploading sensitive files to cloud platforms. Local AI processing offers a vital solution, allowing them to analyze documents and interact with AI securely on-site, keeping all data private.
One of the local AI solutions is LM Studio. You can download this software on your computer, and it will act as your own personal AI assistant, helping you do exactly what you want when you think about ChatGPT.
So, this tutorial gives you the exact steps you need to install and configure LM Studio on your computer (PC or Mac) without getting overwhelmed.
By the end of this tutorial, you’ll learn how to:
- Install and configure LM Studio
- Download and set AI models
- Interact with local AI models
- Chat with your documents using retrieval-augmented generation (RAG)
- Access advanced tools like structured outputs and a local API server
Let’s get into it. Shall we?
Why would I install an AI on my computer?
Running an AI locally typically involves downloading and operating a Large Language Model (LLM) directly on your Mac or PC. Given their substantial size, these models demand significant computing resources to function effectively. However, the major benefit is complete autonomy over your data. There’s no necessity to transmit your information to major corporations like Google or OpenAI, as everything remains securely on your device.
Since not all software is the same, local LLMs vary in size and functionality. Compact and efficient models are compatible with standard consumer hardware, making them accessible to many users. In contrast, more robust and advanced models require powerful GPUs to perform optimally.
This method presents several key benefits:
- Improved Privacy: All your data and inquiries stay on your device, guaranteeing total privacy.
- Offline Accessibility: AI features are available without an internet connection, perfect for remote work or traveling.
- Personalization: You can tailor local models to fit your specific requirements or areas of expertise.
In this guide, we will utilize LM Studio to operate LLMs locally on your PC. LM Studio distinguishes itself among local AI solutions through its intuitive interface and robust capabilities.
While alternatives like Ollama might cater more to developers and are typically accessible only via command-line interfaces, LM Studio provides a conversational interface that balances ease of use with sophisticated features. This makes it suitable for everyday users and those who desire more intricate control over their local AI models.
Step 1 - Install and configure LM Studio
Let’s get into the exciting part. Head over to the LM Studio website and download the software. Whether you have a PC, a Mac, or Linux, LM Studio supports all major operating systems. Select from the available downloads. We are using a Mac system, so we’ll show you how to install and set up the LM Studio on Mac. The procedure is the same for both PC and Mac. Do not worry if you have a PC. It’s the same.

After downloading the appropriate version for your operating system, go ahead and install it.

Launch the newly installed LM Studio. You’ll be greeted with a ChatGPT-style interface. Let’s explore the main components we will be working with.

Sidebar Navigation
The sidebar on the left offers the main navigation:
- Chat: This will be your main AI conversation dashboard. If you have used ChatGPT, you’ll recognize this feature instantly. This tab shows all your conversations, new and old. It creates new discussions, accesses existing ones, and allows you to interact with your loaded AI models.
- Developer: This feature is for advanced users. Click here to access development tools and API settings. This tab lets you integrate LM Studio, API documentation, testing tools, and endpoint configurations into your applications.
- My Models: This section acts as your personal library for all the models you’ve downloaded. You can inspect each model's details, manage how much storage they use, and view which models are currently active. Additionally, it displays the size and format of every model.
- Discover: Navigate through specially curated collections, search for specific Hugging Face models, and access detailed information regarding each model’s capabilities, required storage space, and compatibility with your system.
The interface is designed to be intuitive. It offers robust functionalities. As you work with LM Studio, these features harmonize, giving a seamless experience for running AI models locally.
Let’s start by exploring the essentials first. As you become familiar with this fantastic platform, we will move on to the more advanced features.
Step 2 - Download and set AI models
Once you have a good grasp of the interface, the next step is to download a model from Hugging Face—a widely used platform for sharing machine learning models—to run locally on your machine. For most standard consumer hardware, models with 1 billion to 13 billion parameters are ideal.
To get you started with your own ChatGPT-style app, we’ll select the Meta Llama 3.1 8B model, which is well-suited for most consumer-grade computers. If your PC or Mac has lower specifications, consider choosing smaller models like 3.2 3B. It has 3 billion parameters to ensure smooth performance.
Here’s the exciting part. Navigate to the "Discover" section in the sidebar. Here, you can explore various AI models featured under "Staff picks" or browse the entire collection using the "Search all" option. The models are listed in the Discover tab; you can choose the ones you want to download. There isn’t a lot to choose from. If you have 1 TB of storage, download it all!
Let’s proceed by downloading the Llama 3.1 8B model. Simply enter the model name into the search bar and follow the prompts to initiate the download.

Necessary to know: Since I am using a Mac machine, I chose the MLX models to ensure smoother performance. They are explicitly optimized for Apple Silicon hardware and provide better performance for local deployments. You can choose from a wide array of models if you have a PC. Use GGUF for Windows machines.
Locate the download button adjacent to your selected version and click it to initiate the download process. The model will occupy several gigabytes of space, so ensure you have sufficient storage and a reliable internet connection.

After the download finishes, navigate to the "Chat" tab located in the left sidebar to access the chat interface. At the top of this screen, there will be a dropdown menu. Click on it to view the list of available models. Locate the Llama model you just downloaded and select it by clicking on its name to load it into the chat environment.

Important to know: When engaging in longer conversations within LM Studio, it's advisable to set your context length to approximately 8k tokens to handle more prolonged interactions better. However, proceed cautiously, as this setting can greatly increase memory usage and may cause the application to become unstable. Additionally, it's recommended to keep the GPU offload settings at their default values to ensure optimal performance.

Step 3 - Interact with local AI models
Now that the model is loaded let’s start asking some questions. The LM studio's interface is intuitive and easy to use and resembles popular AI chatbots like Claude and ChatGPT.
To get started, type a question or message in the text box at the bottom. For example, you might try asking:
Prompt:
Can you tell me what a local AI model can do to answer complex mathematical questions?
Engage in the conversation naturally by asking additional questions or introducing fresh ones as we go along. The AI will consider the whole discussion to generate relevant responses.
To open a new conversation, click on the "+" symbol at the top of the conversation history on the left side.

Step 4 - Chat with your documents using RAG
One of LM Studio's prominent features is its capability to interact with documents directly on your computer, thanks to Retrieval-Augmented Generation (RAG). This approach is especially beneficial for handling sensitive or confidential documents since all processing happens locally, ensuring your files remain private and secure.
RAG works by blending the AI's broad knowledge with specific insights from your documents. When you ask a question, it examines your files for relevant content and uses both the information it finds and its built-in knowledge to provide a precise, personalized answer.
To use RAG for chatting with your documents in LM Studio, follow these steps:
- Locate the "Upload File" option in the chat area.
- You can upload up to five files simultaneously, with a total size limit of 30 MB. LM Studio accepts formats like PDF, DOCX, TXT, and CSV.
- After uploading your files, start asking questions about their contents. The AI will scan through the documents to retrieve relevant information and use it to answer you effectively.
How about we try an example of the RAG capabilities using the standard operating procedure documents:
Prompt:
Identify and extract all sections in our employee handbook that cover remote work time-tracking requirements. Then, review and compare these sections with the guidelines. Highlight any areas where our current policy may require updates to stay compliant.


Good to know: To get the best results with RAG, try to ask very specific questions about your documents. The more detail you provide, the easier it is for the system to extract the most relevant information and give you a clear, accurate answer.
Step 5 - Access advanced tools like structured outputs and a local API server
How about we try some advanced features and key settings to tweak our model’s behavior:
Access the advanced controls on the right side of the chat interface under ‘Advanced configuration.’ Click the ‘lab’ icon at the top right corner of the chat interface.

- Temperature: Adjust this slider to control how random the AI’s responses are. Lower values (near 0) make responses more focused and predictable, while higher values (closer to 1) allow for more variety and creativity.
- Top P and Top K: These settings influence how the AI picks its following words. Adjusting them helps you balance keeping responses consistent and allowing for creative variation.
- System Prompt: Here, you can set special instructions for the AI. Unlike regular messages, these instructions guide the AI throughout your conversation, ensuring responses stay aligned with your guidelines.
- Structured Output: This feature formats the AI’s responses clearly and consistently (usually in JSON), which can be helpful for organized data. We’ll cover more about this under “Advanced features.”
- Limit Response Length: This setting lets you control the length of the AI’s responses by setting a token limit, helping you get shorter or more detailed answers based on your preference.

In this guide, we’ve gone through the steps to set up and use local LLMs in LM Studio, allowing you to run AI models on your computer. This approach keeps your data private and gives you full control, from the initial setup to advanced options like RAG.
Running AI locally opens up many possibilities, especially for private applications, offline use, and personal customization. As you familiarize yourself with LM Studio, experiment with different models and settings to find the best setup.