Web Search with LLM in Go

This blog post is inspired by this YouTube video from Matt Williams.

Matt demonstrates a retrieval augmented generation (RAG) system that uses a web search to provide context for the generation. It uses a local large language model (LLM) running in Ollama. In this blog post, I show you my attempt at building this program in Go.

Run LLM with Ollama ¶

We start by installing Ollama. Follow the instructions on the Ollama website. Ollama runs on macOS, Linux, and Windows. It is a platform and tool for running and experimenting with large language models (LLMs) on your local machine. It offers a command-line tool for downloading, running, and interacting with various language models without an internet connection. The models page shows all the available models that you can download and run.

I'm using Alibaba's Qwen 2.5 7B model for this blog post.

After installing Ollama, you can download the model with the following command:

ollama pull qwen2.5:7b

Before we start the model, we need to increase the context window. By default, Ollama uses a context window size of only 2048 tokens. This needs to be larger for a RAG workflow because we embed a lot of text in the context. To increase the context window, create a new file named Modelfile somewhere on your system and add the following content:

FROM qwen2.5:7b
PARAMETER num_ctx 32768

32768 is a good starting point for the context window size. According to my information, this model's maximum context window is 128K tokens.

Now, we need to apply this change to the model. Run the following command:

ollama create -f Modelfile qwen2.5:7b

With the following command, we can start the model and interact with it:

ollama run qwen2.5:7b

What we want to do with the program is to ask questions about recent events. When we ask the model such questions without providing any context, the model cannot provide us with an answer. This makes sense because all LLMs have a cut-off date; in this case, it's October 2023. So, the model is only trained with data that was available up to that date. It's also possible that the model might not be trained on this kind of information.

>>> Who won the Puzzle World Championship 2024?
As of my last update in October 2023, I don't have specific information about who won the Puzzle World Championship 2024
because detailed results and winners for that event are not available yet. The exact details would typically be announced
closer to or after the event's conclusion.
...

In the following article, we will see how to leverage a web search to give context to the model so that it can answer our question.

Web Search ¶

For web searches, it would be best to have a system that can return results in an easily processable format, like JSON. Such an application is SearXNG. SearXNG is a metasearch engine that aggregates results from more than 70 search services and databases. SearXNG can easily be deployed locally with the help of Docker.

An easy way to get started with SearXNG is to clone the searxng-docker repository with the following command:

git clone https://github.com/searxng/searxng-docker.git

Before starting the service, open the searxng/settings.yml file and change it so it looks like this:

# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
  # base_url is defined in the SEARXNG_BASE_URL environment variable, see .env and docker-compose.yml
  secret_key: "supersecret"  # change this!
  limiter: false  # can be disabled for a private instance
  image_proxy: true
ui:
  static_use_hash: true
redis:
  url: redis://redis:6379/0
search:
  formats:
    - html
    - json

It is important to change the secret_key. SearXNG will not start if you don't change this value. Then, I disabled the rate limiter (limiter: false). Don't do this if you plan to expose the service to the internet. I only access it locally here, so it's not a problem. As the last change, I added the search formats html and json because we want to get the search results in JSON format for easy processing in the Go program.

Note: On the first run, remove cap_drop: - ALL from the docker-compose.yaml file. SearXNG needs to create an ini file; without proper permission, this will fail. After the first run, add the configuration back.

We can now start the service with docker compose:

docker compose up -d redis searxng

The docker compose file also contains a configuration for Caddy, but I don't need it because I only want to connect to the service locally. Caddy is only used to terminate SSL connections. If you plan to expose the service to the internet, you can use Caddy as the reverse proxy.

Test if the service is running by opening the following URL in your browser: http://localhost:8080/. If port 8080 on your system is already in use, change the port in the docker-compose.yaml file.

You can find more information about searxng-docker in the documentation.

With the LLM and web search engine in place, we can start building the program that leverages both systems.

Go Program ¶

The workflow of the Go program is as follows:

Ask the LLM to reformulate the user question into a search query.
Search the web with the search query.
Extract the text of the top 3 search results.
Ask the LLM the user question with the extracted text as context.
Print the answer.

First, I added the following dependencies to the project.

go get github.com/go-shiori/go-readability
go get github.com/ollama/ollama

go-readability is a library that extracts the main readable content of an HTML page and removes all the clutter, like buttons, ads, background images, and scripts. go-readability is a port of the JavaScript project Readability.js from Mozilla.

I also added ollama to the project, which contains a client and simplifies the interaction with the LLM. If you would like to access models on Ollama without adding the whole Ollama package to your Go program, check out this blog post, which only uses the Go standard library to interact with Ollama.

main ¶

The main method of the program shows the workflow described above:

func main() {
  ollamaURL, err := url.Parse(ollamaBaseURL)
  if err != nil {
    log.Fatal(err)
  }
  client := api.NewClient(ollamaURL, httpClient)

  query := "Who won the Puzzle World Championship 2024?"
  searchQuery := getSearchQuery(client, query)
  fmt.Println("Query:", searchQuery)

  searchResponse, err := webSearch(searchQuery)
  if err != nil {
    log.Fatal(err)
  }

  searchContext := buildSearchContext(searchResponse.Results)
  answer := getAnswer(client, query, searchContext)
  fmt.Println(answer)
}

main.go

It reformulates the user question with getSearchQuery into a search query, sends the query to the web search engine with webSearch, extracts the text of the top 3 search results with buildSearchContext, and finally, asks the LLM the user question with the extracted text as context with getAnswer.

The call to api.NewClient creates a new client to access Ollama. It takes the Ollama URL, which by default is http://localhost:11434, and an instance of http.Client as arguments. Because running LLMs on your local computer can sometimes take a while until they answer, set the timeout of the http client to an appropriate value. Here, I set it to 10 minutes.

var httpClient = &http.Client{
  Timeout: 10 * time.Minute,
}

main.go

getSearchQuery ¶

The getSearchQuery function reformulates the user question into a search query. The function uses the LLM to generate the search query.

func getSearchQuery(client *api.Client, query string) string {
  messages := []api.Message{
    {
      Role:    "system",
      Content: "You are a professional web searcher.",
    },
    {
      Role:    "user",
      Content: "Reformulate the following user prompt into a search query and return it.Nothing else.\n\n" + query,
    },
  }

  response := executeChat(client, messages)
  return strings.Trim(response, "\"")
}

main.go

executeChat sends the messages to the LLM running on Ollama and returns the response. You can find the implementation of executeChat here.

The program removes any leading and trailing quotes from the response. I noticed that sometimes the answer is quoted, which would result in search results that are not what we expect.

webSearch ¶

The webSearch method sends the query from the previous step to SearXNG and returns the response in the SearchResponse struct.

func webSearch(query string) (*SearchResponse, error) {
  encodedQuery := url.QueryEscape(query)
  requestURL := fmt.Sprintf("%s?q=%s&format=json", searchBaseURL, encodedQuery)

  response, err := httpClient.Get(requestURL)
  if err != nil {
    return nil, err
  }
  defer response.Body.Close()

  var searchResponse SearchResponse
  if err := json.NewDecoder(response.Body).Decode(&searchResponse); err != nil {
    return nil, err
  }

  return &searchResponse, nil
}

main.go

Accessing SearXNG is straightforward. Send a GET request with the search query after the ?q= parameter and add &format=json to get the results in JSON format.

buildSearchContext ¶

This is the method where the application extracts the text from the top 3 search results. It uses go-readability to extract the main content of the page. To speed up the process, it fetches the content concurrently.

func buildSearchContext(results []Result) string {
  var wg sync.WaitGroup
  resultChan := make(chan string, maxResults)

  for i, result := range results {
    if i >= maxResults {
      break
    }

    wg.Add(1)
    go func(result Result) {
      defer wg.Done()
      content, err := fetchTextContent(result.URL)
      if err != nil {
        log.Printf("Error fetching content for URL %s: %v", result.URL, err)
        return
      }
      resultChan <- fmt.Sprintf("%s\n%s\n\n", result.URL, content)
    }(result)
  }

  wg.Wait()
  close(resultChan)

  var contextSB strings.Builder
  for res := range resultChan {
    contextSB.WriteString(res)
  }
  return contextSB.String()
}

func fetchTextContent(url string) (string, error) {
  fmt.Println("Fetching content for URL:", url)
  article, err := readability.FromURL(url, 30*time.Second)
  if err != nil {
    return "", err
  }
  return article.TextContent, nil
}

main.go

getAnswer ¶

The final piece of the puzzle is the getAnswer method. This method sends the user question and the context to the LLM and returns the answer.

func getAnswer(client *api.Client, query, context string) string {
  messages := []api.Message{
    {
      Role: "user",
      Content: fmt.Sprintf("%s\n\nOnly return answer based on the context. "+
        "If you don't know return I don't know:\n###%s\n###", query, context),
    },
  }

  return executeChat(client, messages)
}

main.go

Demo ¶

Now, we can run the program and see if it works. The program should return the answer to the user's question.

go run .

The output looks like this:

Query: Puzzle World Championship 2024 winner
Fetching content for URL: https://www.worldjigsawpuzzle.org/wjpc/2024/individual/final
Fetching content for URL: https://www.worldjigsawpuzzle.org/
Fetching content for URL: https://en.wikipedia.org/wiki/2024_World_Jigsaw_Puzzle_Championship
Based on the provided information, Kristin Thuv from Norway won the Individual event of the 2024 World Jigsaw Puzzle Championship with a time of 00:37:58.

After "Query:", we see the reformulated search query. Then, it lists the 3 URLs from which the program extracted the text. The last line shows the answer from the LLM to the user question.

Conclusion ¶

We have seen how we can leverage a web search engine to provide context to a large language model, how we can run an LLM locally with Ollama, and how we can interact with the LLM from a Go program.

It has shown how easy it is to build a simple example to leverage a retrieval augmented generation (RAG) workflow to get answers to questions not in the LLM training data.