Running LLM-Generated Go Code in a Docker Container

In this article, I will show you how to write a Go program that generates Go code with the help of a large language model (LLM) and run this generated code in a Docker container.

This could be part of an LLM agent system or a tool in a function/tool calling request. The idea is to give the LLM a tool that allows it to write a program that solves a certain task or retrieves certain information it needs to answer a prompt. For example, to answer a complex mathematical question or to fetch data from a data source that is not directly accessible by the LLM.

LangChain, for example, provides this tool for that purpose: Python REPL.

Most often, these tools generate code in an interpreted language like Python and JavaScript. These languages have the advantage that they don't need a compilation step.

For this blog post, I wanted to see how this works with generated Go code instead. Go is a compiled language, but the Go compiler is very fast, so there should not be a big difference in execution time. Depending on the generated code, we might even see faster execution times.

All these tools that generate and execute code have in common that they don't run the code directly on the host machine because that would be a security risk. The code could do anything on the host machine. Instead, they run the code in a sandboxed environment. A straightforward way to do this is to run the code in a Docker container.

Access Docker Host ¶

Docker consists of a client and a server. A client, usually the Docker command line tool, sends commands to the server, which executes them.

Because the main program that orchestrates everything is written in Go, it has access to a well-maintained Go client library for Docker. The library comes directly from the developers of Docker and can be installed with the following command:

go get github.com/docker/docker

This is the same library the Docker command line tool uses, written in Go.

Hello World ¶

Let's start with a simple example that shows how to pull an image from Docker Hub, create a container from it, and start the container.

The first step is to create a client object that a program can use to interact with Docker.

  ctx := context.Background()
  cli, err := client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())
  if err != nil {
    panic(err)
  }
  defer cli.Close()

main.go

Next, the application pulls the image from Docker Hub. The ImagePull method runs asynchronously and returns a reader object that can be used to read the output of the pull operation. Because this operation is asynchronous, the application needs to wait until the pull operation is completed. The code does this by reading from the reader object until it is closed.

  reader, err := cli.ImagePull(ctx, "docker.io/library/alpine", image.PullOptions{})
  if err != nil {
    panic(err)
  }

  defer reader.Close()
  _, err = io.Copy(os.Stdout, reader)
  if err != nil {
    panic(err)
  }

main.go

The following code creates a container from the image with the ContainerCreate method. This method requires the image name and the command that should be executed. In this example, this is a simple echo command that prints "hello world" to the standard output.

The method returns a response object containing the created container's ID.

  resp, err := cli.ContainerCreate(ctx, &container.Config{
    Image: "alpine",
    Cmd:   []string{"echo", "hello world"},
    Tty:   false,
  }, nil, nil, nil, "")
  if err != nil {
    panic(err)
  }

main.go

Next, the ContainerStart method starts the container. This method also runs asynchronously. The docker library provides a convenient method, ContainerWait that waits until the container reaches a certain state. In this example, it waits until the container is no longer running.

  if err := cli.ContainerStart(ctx, resp.ID, container.StartOptions{}); err != nil {
    panic(err)
  }

  statusCh, errCh := cli.ContainerWait(ctx, resp.ID, container.WaitConditionNotRunning)
  select {
  case err := <-errCh:
    if err != nil {
      panic(err)
    }
  case <-statusCh:
  }

main.go

The Go code runner needs a way to read the output of the container. For this purpose, the Docker library provides the ContainerLogs method. This method returns an object that contains a reference to the stdout and stderr streams of the container.

An application can specify with LogsOptions if it wants to read both streams or just one.

The Docker library provides a method StdCopy that reads from the reader object and writes it into a writer object. This example just redirects the output to the standard output and error of the host program.

  out, err := cli.ContainerLogs(ctx, resp.ID, container.LogsOptions{ShowStdout: true, ShowStderr: true})
  if err != nil {
    panic(err)
  }

  _, err = stdcopy.StdCopy(os.Stdout, os.Stderr, out)
  if err != nil {
    return
  }

main.go

The last step is to remove the container. The program does this with the help of the ContainerRemove method.

  if err := cli.ContainerRemove(ctx, resp.ID, container.RemoveOptions{}); err != nil {
    panic(err)
  }

main.go

The equivalent Docker command line tool commands for this example would be:

docker pull alpine
docker create --name my-container alpine echo "hello world"
docker start my-container
docker logs my-container
docker rm my-container

You can see that all the docker subcommands have a corresponding method in the Docker client library.

Run Go code ¶

Based on the previous hello world example, we can now easily write a Go program that runs Go code in a Docker container. I wrote the RunCodeInDocker method for this blog post.

The method receives a Go program as a string and a timeout as a parameter. Because it is possible that the LLM generates code that runs into an endless loop or takes too long to execute, the application needs a way to stop the execution. The code leverages the method WithTimeout from the context package to do this. The method will stop after the given timeout.

The whole process of instantiating the client and pulling the image from Docker Hub is the same as in the previous example. The only difference here is that the method pulls the image golang:1.23-alpine3.21 that contains the whole Go toolchain.

func RunCodeInDocker(code string, timeout time.Duration) (string, string, error) {
  ctx, cancel := context.WithTimeout(context.Background(), timeout)
  defer cancel()

  cli, err := client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())
  if err != nil {
    return "", "", err
  }
  defer cli.Close()

  reader, err := cli.ImagePull(ctx, "docker.io/library/golang:1.23-alpine3.21", image.PullOptions{})
  if err != nil {
    return "", "", err
  }
  defer reader.Close()

  _, err = io.Copy(io.Discard, reader)
  if err != nil {
    return "", "", err
  }

dockerrun.go

The interesting part is the ContainerCreate method. Here, the code specifies the command that Docker must execute. This command is a one-liner shell script that creates a directory, initializes a Go module, writes the given code into a file, runs go mod tidy to download the dependencies, builds the program, and runs it. The purpose of using a Go module here is that whenever the generated code references a 3rd party library, the Go module system will download the library automatically.

  resp, err := cli.ContainerCreate(ctx, &container.Config{
    Image: "golang:1.23-alpine3.21",
    Cmd: []string{
      "/bin/sh",
      "-c",
      `mkdir -p /code && cd /code && go mod init gorun && echo '` + code + `' > app.go && go mod tidy && go build -o app app.go && ./app`,
    },
    Tty: false,
  }, nil, nil, nil, "")
  if err != nil {
    return "", "", err
  }
  defer func() {
    _ = cli.ContainerRemove(ctx, resp.ID, container.RemoveOptions{})
  }()

  if err := cli.ContainerStart(ctx, resp.ID, container.StartOptions{}); err != nil {
    return "", "", err
  }

  statusCh, errCh := cli.ContainerWait(ctx, resp.ID, container.WaitConditionNotRunning)
  select {
  case err := <-errCh:
    if err != nil {
      return "", "", err
    }
  case <-statusCh:
  case <-ctx.Done():
    return "", "", ctx.Err()
  }

dockerrun.go

After starting the container and waiting for it to finish, the program reads the output of the program from the stdout and stderr streams and returns them to the caller.

  out, err := cli.ContainerLogs(ctx, resp.ID, container.LogsOptions{ShowStdout: true, ShowStderr: true})
  if err != nil {
    return "", "", err
  }

  var stdoutBuf, stderrBuf bytes.Buffer
  _, err = stdcopy.StdCopy(&stdoutBuf, &stderrBuf, out)
  if err != nil {
    return "", "", err
  }

  stderr := cleanStderr(stderrBuf.String())

  return strings.TrimSpace(stdoutBuf.String()), strings.TrimSpace(stderr), nil

dockerrun.go

One problem I had here was that the go commands like go mod init and go mod tidy write their info messages into the stderr stream, but I only wanted errors in the stderr stream, so the method cleanStderr removes info messages from the stderr stream. You find the implementation of this method here

Generate Go code ¶

The next step is to write the code that generates the Go code that should be executed in the Docker container. This code depends on the LLM that you use. For this blog post, I use Google's Gemini 2.0 Flash model, which is currently (January 2025) in beta and free to use.

The following demo application uses the Go SDK for Google Generative AI library for interacting with the Gemini model. You install this library with the following command:

go get github.com/google/generative-ai-go

The following code shows how to generate Go code with the help of the Gemini model. The code sends a prompt to the model and receives the generated code as a response. The code then cleans up the response and returns the generated code.

func GenCode(userPrompt string, history []*genai.Content) string {
  ctx := context.Background()
  apiKey, ok := os.LookupEnv("GEMINI_API_KEY")
  if !ok {
    log.Fatalln("Environment variable GEMINI_API_KEY not set")
  }

  client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
  if err != nil {
    log.Fatalf("Error creating client: %v", err)
  }
  defer client.Close()

  model := client.GenerativeModel("gemini-2.0-flash-exp")
  model.SetTemperature(0.7)
  model.SetMaxOutputTokens(8192)
  model.ResponseMIMEType = "text/plain"

  systemPrompt := `
Write a Go program that meets the following requirements:
- The code should follow best practices.
- Ensure the code is efficient and optimized.
- Handle errors gracefully and provide meaningful error messages.
- Use idiomatic Go constructs and conventions.
- Do not comment the code
- Only return the Go code. Nothing else should be returned.
`
  model.SystemInstruction = &genai.Content{
    Parts: []genai.Part{genai.Text(systemPrompt)},
  }

  session := model.StartChat()
  session.History = history
  resp, err := session.SendMessage(ctx, genai.Text(userPrompt))
  if err != nil {
    log.Fatalf("Error sending message to LLM: %v", err)
  }

  if len(resp.Candidates) > 0 && len(resp.Candidates[0].Content.Parts) > 0 {
    llmOutput := ""
    for _, part := range resp.Candidates[0].Content.Parts {
      llmOutput += fmt.Sprintf("%v", part)
    }

    return cleanup(llmOutput)
  }

  log.Fatalln("No response received from LLM")
  return ""
}

func cleanup(code string) string {
  updatedCode := strings.TrimPrefix(code, "```go")
  return strings.TrimSuffix(updatedCode, "```\n")
}

codegen.go

Gemini often wraps the generated code in a Markdown code block. The cleanup method removes the code block.

Everything together ¶

Now we can put everything together. The following method, GenRunWithRetries expects a prompt, sends the prompt to the LLM to generate the code, then executes the code in a Docker container and returns the output to the caller.

One additional feature here is that the method implements a feedback loop in case the generated code does not compile or crashes when run. The output of the failed run will be fed back to the LLM. The LLM can then try to fix the problem based on this feedback. When you implement such a feedback loop, it is important to implement a stop condition; otherwise this might run into an endless loop.

This method implements a simple stop condition based on the number of retries. The method receives the maximum number of retries as a parameter. If the maximum number of retries is reached, the method stops and returns an error message.

The code calls the two methods you saw before, GenCode and RunCodeInDocker, to generate the code and to run it.

func GenRunWithRetries(userPrompt string, maxRetries int) (string, error) {
  if maxRetries <= 0 {
    return "", fmt.Errorf("maxRetries must be greater than 0, got %d", maxRetries)
  }

  var history []*genai.Content
  retryCount := 0

  for {
    fmt.Println("Generating code...")
    generatedCode := GenCode(userPrompt, history)

    retryCount++
    fmt.Println("Running code in Docker...")
    stdout, stderr, err := RunCodeInDocker(generatedCode, 5*time.Minute)

retry.go

If the RunCodeInDocker method returns an error or the stderr stream is not empty, the method checks if the maximum number of retries is reached. If not, it feeds the error message to the LLM and tells the LLM to fix the error and return the code.

    if err != nil || stderr != "" {
      if retryCount >= maxRetries {
        if err != nil {
          return "", fmt.Errorf("error running code in Docker after %d retries: %v", maxRetries, err)
        }
        return "", fmt.Errorf("error running code in Docker after %d retries: %s", maxRetries, stderr)
      }

      errorMsg := ""
      if err != nil {
        errorMsg = fmt.Sprintf("Docker run failed with error: %v", err)
      } else {
        errorMsg = fmt.Sprintf("Docker run failed with stderr: %s", stderr)
      }

      fmt.Printf("Attempt %d failed: %s\n", retryCount, errorMsg)
      fmt.Println("Regenerating code with error context...")

      history = append(history,
        &genai.Content{
          Role:  "user",
          Parts: []genai.Part{genai.Text(userPrompt)},
        },
        &genai.Content{
          Role:  "model",
          Parts: []genai.Part{genai.Text(generatedCode)},
        })

      userPrompt = fmt.Sprintf(
        "The previous code execution failed with:\n\n%s\n\n"+
          "Please analyze the error and generate corrected Go code that:\n"+
          "1. Fixes the specific error\n"+
          "2. Maintains the original functionality", errorMsg)

      continue
    }

    return stdout, nil
  }

retry.go

Tool Calling ¶

The following example shows you how to use the methods I presented you in the previous parts of this article as a tool, part of a tool calling request.

If you are unfamiliar with tool (or function) calling, here is a short explanation. Tool calling is a method to invoke external tools or functions that are not directly implemented within the LLM, allowing it to perform tasks beyond its built-in capabilities. This can include complex operations or accessing external systems to retrieve information or perform computations.

Note that it's not the LLM that calls the tool. Instead, the LLM sends back a special response to the caller that contains the information needed to call the tool. The caller then calls the tool and sends the output back to the LLM.

Here is an overview of how this works:

After this short introduction, let's see how this can be implemented in Go and the generative-ai-go library.

The first request a client sends to the LLM is a normal prompt that contains a list of tools that the LLM can use if it's not able to answer the prompt directly. This description should be quite detailed so that the LLM understands what the tool can do and when to use it.

In this example, the code generator tool does not expect the generated code from the LLM. Instead, it tells the LLM to write a detailed description that can then be sent to a code generator that generates and runs the code.

  ctx := context.Background()
  client, err := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("GEMINI_API_KEY")))
  if err != nil {
    log.Fatal(err)
  }
  defer client.Close()

  schema := &genai.Schema{
    Type: genai.TypeObject,
    Properties: map[string]*genai.Schema{
      "detailedDescription": {
        Type:        genai.TypeString,
        Description: "The detailed description about a program the gen_code_tool should generate",
      },
    },
    Required: []string{"detailedDescription"},
  }

  generalPurposeTool := &genai.Tool{
    FunctionDeclarations: []*genai.FunctionDeclaration{{
      Name: "gen_code_tool",
      Description: `This is a general-purpose tool that can be used to generate code for various tasks,
including searching for information about competitions and events and accessing public APIs and accessing real-time information
and has access to the Internet. The tool generates Go code for the given detailed description and runs it in a Docker container.
It returns the output of the code execution. `,
      Parameters: schema,
    }},
  }

main.go

The following code sends the prompt, including the tool description from above, to the LLM.

  model := client.GenerativeModel("gemini-2.0-flash-exp")

  model.Tools = []*genai.Tool{generalPurposeTool}
  session := model.StartChat()
  // userPrompt := "What is the current date and time"
  userPrompt := "Is the number 1201281 prime? List all divisors of this number?"
  res, err := session.SendMessage(ctx, genai.Text(userPrompt))
  if err != nil {
    log.Fatalf("session.SendMessage: %v", err)
  }

main.go

After receiving the response, the code checks if the response contains a tool-calling response. Not all tool-calling requests might end up in a tool-calling response. If the LLM is able to answer the prompt directly with its internal knowledge, it returns a normal response.

  part := res.Candidates[0].Content.Parts[0]
  funcall, ok := part.(genai.FunctionCall)
  if !ok {
    fmt.Println("No Function Calling")
    fmt.Println(part)
    return
  }

main.go

If it is a tool calling response, the application extracts the expected detailed description and passes it to the GenRunWithRetries method, which generates the code, runs the code in a Docker container, and returns the output.

  if funcall.Name != generalPurposeTool.FunctionDeclarations[0].Name {
    log.Fatalf("expected %q, got %q", generalPurposeTool.FunctionDeclarations[0].Name, funcall.Name)
  }

  description, ok := funcall.Args["detailedDescription"].(string)
  if !ok {
    log.Fatalf("expected string: %v", funcall.Args["detailedDescription"])
  }

  result, err := internal.GenRunWithRetries(description, 3)
  if err != nil {
    log.Fatal(err)
  }

main.go

The following code sends the output of the tool back to the LLM. The LLM can then use this output to answer the initial prompt. Because LLMs are stateless, this second request must contain the whole history of the conversation. The generative-ai-go library takes care of this internally with the help of a session object.

  res, err = session.SendMessage(ctx, genai.FunctionResponse{
    Name: generalPurposeTool.FunctionDeclarations[0].Name,
    Response: map[string]any{
      "result": result,
    },
  })
  if err != nil {
    log.Fatal(err)
  }
  for _, cand := range res.Candidates {
    if cand.Content != nil {
      for _, part2 := range cand.Content.Parts {
        fmt.Println(part2)
      }
    }
  }

main.go

Conclusion ¶

In this article, you have seen how to run Go code generated by an LLM in a Docker container. This very powerful concept allows an LLM to write code that solves a task that is too complex for the LLM to solve directly. Using Go works very similarly to using Python or JavaScript. Thanks to the very fast Go compiler, the execution time is not much different than using an interpreted language.