Home | Send Feedback

Access HIBP Pwned Passwords with Go

Published: 10. February 2022  •  Updated: 11. March 2023  •  go

When writing an application where users can sign up and sign in with passwords, it is recommended to follow these guidelines to check the password quality and prevent the users from choosing simple and easy-to-guess passwords. One recommendation is to check the passwords against a list of commonly used or compromised passwords and reject those.

In a previous blog post I showed you different ways how to do it.

One of the largest databases of compromised passwords is Pwned Passwords from have i been pwned?.

The service provides an HTTP API that makes it easy to check if a particular password is in the Pwned Passwords database. The API is free to use, does not require an API key, and can be accessed from a browser or a back end. This blog post focuses on the back end and shows you different ways to access this database with Go.

HTTP API

The first and most obvious way to access the database is by sending an HTTP request to the Pwned Passwords HTTP API. The standard library of Go includes an HTTP client, so we can implement this without any 3rd party library.

To check if a password exists in the database, you can't simply send the password in plaintext to the service. That would not be very secure. So instead, the API implemented a k-Anonymity model that allows a password to be searched for by partial hash.

Here is the workflow a client has to implement

  1. Create SHA-1 hash of the password
  2. Take the first five characters of the hash and send it to the Pwned Passwords HTTP API.
  3. The response contains a list of all SHA-1 hashes that start with the same five characters.
  4. The client has to iterate through the list and search if the hash from Step 1 is in the list
  5. If it's in the list, the password exists in the HIBP password database.

This is an example of how the response looks.

0018A45C4D1DEF81644B54AB7F969B88D65:1
00D4F6E8FA6EECAD2A3AA415EEC418D38EC:2
011053FD0102E94D6AE2F8B83D76FAF94F6:1
012A7CA357541F0AC487871FEEC1891C49C:2
0136E006E24E7D152139815FB0FC6A50B15:2

The first five characters of the hash are not part of the response. The number after the colon (:) says how many times the password appeared in a breach.

To enhance security even further, it is possible to tell the service to add padding. Because each five-character prefix returns a different amount of hashes, a man-in-the-middle could determine what password the client asked for based on the response size. With padding, each response approximately has the same size.

If padding is enabled, the service adds random hashes at the end with an appearance count of 0

FF849C5DE93593756DDF7AECBBE40B9A947:9
FF90FEE99C8E2961AD1077ADEE4C6FEFF30:1
FFA7B0AD1884D08B5AE2644C8D9D76BD4B5:1
85EB3DA4B91A01D5389998B53B24F9091B0:0   <--- padding records
7CC3E32849664C6E4FD387F94B7262D625C:0   <---
3DFAA395724C94E783EB77CB6C7BF429BC2:0   <---

To enable padding, send the request header Add-Padding: true.

Here is a Go program that checks if the password "123456" is in the Pwned Passwords database.

  const password = "123456"
  h := sha1.New()
  _, err := io.WriteString(h, password)
  if err != nil {
    log.Panicf("sha1 write string failed %v\n", err)
  }

  sha1hash := fmt.Sprintf("%x", h.Sum(nil))
  sha1hash = strings.ToUpper(sha1hash)
  httpClient := &http.Client{
    Timeout: time.Second * 10,
  }

  sha1hashFirst5 := sha1hash[0:5]
  sha1hashRest := sha1hash[5:]

  const url = "https://api.pwnedpasswords.com/range/"
  req, err := http.NewRequest("GET", url+sha1hashFirst5, nil)
  if err != nil {
    log.Panicf("creating request failed %v\n", err)
  }
  req.Header.Set("Add-Padding", "true")

  response, err := httpClient.Do(req)
  if err != nil {
    log.Panicf("http client get failed %v\n", err)
  }
  defer response.Body.Close()

  scanner := bufio.NewScanner(response.Body)
  found := false

  for scanner.Scan() {
    line := scanner.Text()
    appearances, err := strconv.Atoi(line[36:])

    if err != nil {
      log.Println(line)
      log.Panicf("conversion failed %s %v\n", line[37:], err)
    }

    if appearances == 0 {
      break
    }

    if line[:35] == sha1hashRest {
      fmt.Println("found it: ", appearances)
      found = true
      break
    }
  }

  if !found {
    fmt.Println("not found")
  }

main.go

For more information about the API, check out the documentation.

Self-Host

Not everybody likes the idea of sending requests over the Internet to check for compromised passwords. This could be because of security concerns or the concern of being dependent on a 3rd party service.

Fortunately, there is a way to self-host the password database and access it locally. Pwned Passwords is the only service from have i been pwned? that allows you to download the complete dataset.

The dataset is free and can be downloaded without an account. The service does not provide a single file that you can download. Instead, it is possible to download the whole dataset with the range API we used in the previous example. To do that, an application loops through all possible five character hex strings (00000 to FFFFF) and calls the API for each string. have i been pwned? provides an official downloader that implements this download method.

You can also use my downloader written in Go: https://github.com/ralscha/hibp-passwords-downloader

The following applications expect the data as individual files. Choose the appropriate option when running the downloader.

Each line of the downloaded text files contains an SHA-1 encoded password. After the hash follows a colon and the number of appearances in a breach.

00CF24504B1DC456D5CB53604B3B21BA22D:5
00EEBDBCAEB41775D3CCC3DCFAD5E3A3778:1
011657A919210BFD211C1B3BCD40F69591E:3
018DBD1B4098C8E29C3176AF01170A6B023:1
01BA5C35923722B4C41BB548465BA8121F7:1
020F02174603CFF7CF01282D3D6C64B22D4:5
02256EBA5CE9629C1C1ED4460922555FF46:3

Note that each line only contains 35 characters of the SHA-1. The API omits the five character prefix in the response. So remember to add the prefix to the hash read from the file before processing the data.

We need to import the file into a database to have fast access to the data. A text search would take too long.

You can import this data into any database. I will show you how to use an embedded database in Go for this blog post. A key/value store fits the data structure perfectly, and the Go ecosystem offers different options for this kind of database: bbolt, goleveldb, BadgerDB and Pebble.

I will show you the code with Pebble, which produced the smallest database file in my tests.

Pebble is the database engine underneath the CoackroachDB database. It is a RocksDB/LevelDB inspired key-value database.

The importer reads all the downloaded Pwned Passwords text files and imports the data into the Pebble database. Pebble is a file-based database, so you must specify a directory when opening the database with pebble.Open.

func main() {

  inputDir := "./pwned"

  files, err := os.ReadDir(inputDir)
  if err != nil {
    log.Fatalf("Can't read input directory %v", err)
  }
  var fileNames []string
  for _, file := range files {
    fileNames = append(fileNames, file.Name())
  }
  sort.Strings(fileNames)

  path := "/var/lib/hibp/pebble"
  err = os.MkdirAll(path, os.ModePerm)
  if err != nil {
    log.Fatalf("Can't create data directory %v", err)
  }

  db, err := pebble.Open(path, &pebble.Options{DisableWAL: true})
  if err != nil {
    log.Fatalf("Can't open database %v", err)
  }
  defer func(db *pebble.DB) {
    err := db.Close()
    if err != nil {
      log.Fatalf("Can't close database %v", err)
    }
  }(db)

  for _, fileName := range fileNames {
    file, err := os.Open(inputDir + "/" + fileName)

    batch := db.NewBatch()

    hashPrefix := strings.Split(fileName, ".")[0]

    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
      line := scanner.Text()
      line = hashPrefix + line
      hexString := line[0:40]
      appears, err := strconv.ParseUint(line[41:], 10, 32)
      if err != nil {
        log.Fatalf("String to int conversion failed %v", err)
      }

      decodedHex, err := hex.DecodeString(hexString)
      if err != nil {
        log.Fatalf("Failed to decode hex string %v", err)
      }

      buf := make([]byte, binary.MaxVarintLen64)
      n := binary.PutUvarint(buf, appears)
      appearsBytes := buf[:n]

      // insert into database
      err = batch.Set(decodedHex, appearsBytes, pebble.NoSync)
      if err != nil {
        log.Fatalf("Set value into database failed %v", err)
      }
    }

    err = batch.Commit(pebble.NoSync)
    if err != nil {
      log.Fatalf("Commit failed %v", err)
    }

    if err := scanner.Err(); err != nil {
      log.Fatalf("Scanner failed %v", err)
    }
  }

  err = db.Flush()
  if err != nil {
    log.Fatalf("Flush failed %v", err)
  }

}

main.go

Because Pebble allows arbitrary binary values as the key, the import program converts the SHA-1 hex string into the raw form. So instead of 40 bytes for the hex string, it only stores 20 bytes per key. In addition, for the appearance value, it uses binary.PutUvarint to convert the number into an unsigned variable length encoded number to save even more space.


Next, we need a program that reads the data. The following application is a simple command-line tool where you can pass the password as the argument.

  if len(os.Args) != 2 {
    log.Fatalf("Password is missing.\nUsage: %s <password>", os.Args[0])
  }

  dbPath := "/var/lib/hibp/pebble"
  db, err := pebble.Open(dbPath, &pebble.Options{})
  if err != nil {
    log.Fatalf("Can't open database %v", err)
  }
  defer db.Close()
  password := os.Args[1]

  h := sha1.New()
  h.Write([]byte(password))
  sha1hash := h.Sum(nil)

  data, closer, err := db.Get(sha1hash)
  if err != nil && err != pebble.ErrNotFound {
    log.Fatalf("Can't get value for key %v", sha1hash)
  }

  if err == nil {
    defer closer.Close()
    value, _ := binary.Uvarint(data)
    fmt.Printf("Password found. It appears %d times in the database\n", value)
  } else {
    fmt.Printf("Password not found in the HIBP database")
  }

main.go

The program calculates the SHA-1 of the input and runs a key lookup (db.Get).

Self-host HTTP API Server

The disadvantage of using an embedded database is that only one process can access the data. So if multiple services need to read the database, you must copy the database. But instead of doing that, we can write our own Pwned Passwords HTTP API server. Thanks to the comprehensive Go standard library, we can do that without any additional 3rd party library.

You find my implementation on GitHub.

The server opens the database we created in the previous section and waits for incoming requests. It uses Pebble's prefix key scan feature to find all keys that start with a prefix. It also checks for the add-padding request header and adds random hashes to the response.

I tried to copy all the features of the original service. If you are interested, you find the source code of the actual Pwned Passwords service here. It is implemented as an Azure Function.

Because my implementation behaves the same as the original, it is possible to take any HIBP client and point it to the Go service. For example, take the first example from this blog post, change the URL, and run it. It still works.

Bloom

The downside of the previous solutions is the huge database size on disk. It needs about 20GB of storage for the database.

You could look at a different data structure if that is a problem. Bloom filter is an interesting solution. A Bloom filter is a data structure designed to tell, space-efficiently, if an element is present in a set. To achieve space efficiency, a Bloom filter is a probabilistic data structure. It tells you that an element is definitely not in the set or possibly in the set. That means it reports false positives.

When we store the Pwned Passwords file in a Bloom filter, we can check if a password is definitely not in the set. But the Bloom filter might tell us that a password is in the set where in reality, it's not (false positive).

The probability of false positives is configurable and determines the size of the Bloom filter in memory. The smaller the false positive rate, the larger the memory usage.

The following application uses this Bloom library.
Also, check out the Awesome Go list for other Bloom filter implementations.

Reading the whole Pwned Passwords text file into the Bloom filter takes a long time, and it would not be feasible to do that every time the application starts. So instead, an import program imports the data once and stores a binary representation of the Bloom filter as a file.

func main() {
  inputDir := "./pwned"

  files, err := os.ReadDir(inputDir)
  if err != nil {
    log.Fatalf("Can't read input directory %v", err)
  }
  var fileNames []string
  for _, file := range files {
    fileNames = append(fileNames, file.Name())
  }
  sort.Strings(fileNames)

  // count number of password hashes
  var nPasswords uint
  for _, fileName := range fileNames {
    file, err := os.Open(inputDir + "/" + fileName)
    if err != nil {
      log.Fatalf("Can't open file %s, %v", fileName, err)
    }
    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
      nPasswords++
    }
    err = file.Close()
    if err != nil {
      log.Fatalf("Can't close file %s, %v", fileName, err)
    }
  }

  fmt.Println("nPasswords: ", nPasswords)

  fps := []float64{0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 0.00000001}
  for ix, fp := range fps {
    fmt.Println("count: ", fp)
    filter := bloom.NewWithEstimates(nPasswords, fp)
    insert(fileNames, inputDir, filter, nPasswords)

    encode, err := filter.GobEncode()
    if err != nil {
      log.Fatal("gob encode failed", err)
    }

    file, err := os.Create("./pwned-passwords-sha1-ordered-by-hash-v8_" + strconv.Itoa(ix+1) + ".gob")
    if err != nil {
      log.Fatal("Can't create bloom file", err)
    }

    _, err = file.Write(encode)
    if err != nil {
      log.Fatal("Can't write bloom file", err)
    }

    fmt.Println("bytes: ", len(encode))
    err = file.Close()
    if err != nil {
      log.Fatal("Can't close bloom file", err)
    }
    fmt.Println()
  }

}

func insert(fileNames []string, inputDir string, filter *bloom.BloomFilter, nPasswords uint) {
  var count uint
  for _, fileName := range fileNames {
    file, err := os.Open(inputDir + "/" + fileName)
    if err != nil {
      log.Fatalf("Can't open file %s, %v", fileName, err)
    }
    hashPrefix := strings.Split(fileName, ".")[0]
    scanner := bufio.NewScanner(file)
    for scanner.Scan() {
      line := scanner.Text()
      line = hashPrefix + line
      hexString := line[0:40]

      decodedHex, err := hex.DecodeString(hexString)
      if err != nil {
        log.Fatalf("Failed to decode hex string %s, %v", hexString, err)
      }
      count++
      filter.Add(decodedHex)

      if count%10_000_000 == 0 {
        fmt.Println(count*100/nPasswords, "%")
      }
    }
  }
}

main.go

A program reads the serialized Bloom filter from the disk into memory and then looks up the password.

  file, err := os.Open("./pwned-passwords-sha1-ordered-by-hash-v8_1.gob")
  if err != nil {
    log.Fatal("Can't open pwned file", err)
  }
  defer file.Close()

  const nPasswords = 851_082_816
  const falsePositiveRate = 0.1
  filter := bloom.NewWithEstimates(nPasswords, falsePositiveRate)

  bytes, err := io.ReadAll(file)
  if err != nil {
    log.Fatal("Can't read pwned file", err)
  }

  err = filter.GobDecode(bytes)
  if err != nil {
    log.Fatal("Can't decode bytes", err)
  }

  const password = "123456"

  h := sha1.New()
  h.Write([]byte(password))
  sha1hash := h.Sum(nil)

  if filter.Test(sha1hash) {
    log.Println("Password is in the filter")
  } else {
    log.Println("Password is not in the filter")
  }

main.go

Here is an overview of how the false positive rate influences the size of the Bloom filter.

false-positive rate size (bytes)
10 % 487 MB
1 % 973 MB
0.1 % 1.5 GB

That concludes this tutorial about accessing the Pwned Passwords database with Go. It is pretty easy to self-host the database and access it without sending requests over the Internet and being dependent on a 3rd party service.