Searching "Have I been pwned?" passwords locally with Java

Published: March 01, 2018  •  java

Just a few days ago haveibeenpwned.com released an updated version of the passwords list with over 501 millions password hashes from data breaches. On their website you can check if your passwords are on this list. The service does a very clever thing to check if passwords are on the list but not reveal too much information about the password to the back end service.

A JavaScript application on the website first calculates the SHA-1 hash, sends the first 5 characters of the hash to the service. The service responds with a list of all password hashes that start with these 5 characters. The JavaScript code in the browser then checks if the SHA-1 hash of the password in question matches one on the list. Read more about this in this blog post from Troy Hunt (the developer of Have I been pwned).

The service also provides an API that you can access with any HTTP client. Here an example in Java and the OkHttp library.

    String password = "123456";
    byte[] passwordBytes = md.digest(password.getBytes());
    String hex = HexUtil.byteArrayToString(passwordBytes).toUpperCase();
    String prefixHash = hex.substring(0, 5);
    String suffixHash = hex.substring(5);

    OkHttpClient client = new OkHttpClient();
    String url = "https://api.pwnedpasswords.com/range/" + prefixHash;

    Request request = new Request.Builder().url(url).build();
    try (Response response = client.newCall(request).execute();
        ResponseBody body = response.body()) {
      if (body != null) {
        String hashes = body.string();
        String lines[] = hashes.split("\\r?\\n");

        for (String line : lines) {
          if (line.startsWith(suffixHash)) {
            System.out.println(
                "password found, count: " + line.substring(line.indexOf(":") + 1));
            return;
          }
        }
        System.out.println("password not found");
      }

    }

ApiExample.java

You find more information about the API on this site: https://haveibeenpwned.com/API/v2


If you don't like to use a third-party service and send the passwords over the internet (even it's only a part of the SHA-1 hash) to check, you can download the whole 501 millions passwords onto your computer and then search it locally. This is what we will do in this blog post and the challenge is to do everything in Java. Here are the tasks.

overview

We will write one Java application for each of this tasks

Download

You can download the password file via a HTTP link, but the developer of the website recommends to download the list with BitTorrent.

In Java, we can use the ttorrent library that implements the BitTorrent protocol.

    <dependency>
      <groupId>com.turn</groupId>
      <artifactId>ttorrent-core</artifactId>
      <version>1.5</version>
    </dependency>

pom.xml

But before we can start the download we need the torrent file. The torrent file contains metadata about the file to be downloaded and a list of tracker locations that help peers to find each other. You could download this file manually from this URL (https://downloads.pwnedpasswords.com/passwords/pwned-passwords-2.0.txt.7z.torrent) but the challenge is to do everything with Java. Therefor we download the file with the OkHttp HTTP client library

    <dependency>
      <groupId>com.squareup.okhttp3</groupId>
      <artifactId>okhttp</artifactId>
      <version>3.10.0</version>
    </dependency>

pom.xml

    String torrentURL = "https://downloads.pwnedpasswords.com/passwords/pwned-passwords-2.0.txt.7z.torrent";

    Path torrentFile = Paths.get("e:/temp/pwned.torrent");
    OkHttpClient client = new OkHttpClient();
    Request request = new Request.Builder().url(torrentURL).build();
    try (Response response = client.newCall(request).execute();
        ResponseBody body = response.body()) {
      if (body != null) {
        Files.copy(body.byteStream(), torrentFile, StandardCopyOption.REPLACE_EXISTING);
      }
      else {
        System.out.println("could not download torrent file");
        return;
      }
    }

Download.java

Next we instantiate the ttorent Client class and pass the path to the torrent file and the path to the output directory, where the client can store the downloaded file.

    Path outputDir = Paths.get("e:/temp");

    Client torrentClient = new Client(InetAddress.getLocalHost(),
        SharedTorrent.fromFile(torrentFile.toFile(), outputDir.toFile()));

    torrentClient.addObserver((observable, data) -> {
      float progress = ((Client) observable).getTorrent().getCompletion();
      System.out.println(progress + "%");
    });

    torrentClient.download();
    torrentClient.waitForCompletion();

Download.java

The observer is optional, but because the file is quite large and the download can take a while, it's helpful to see the progress. The ttorrent Client has a built-in recovery mechanism that resumes the download if the download or the application unexpectedly stopped. You can restart the application at any time and the client will resume the download where it stopped.

Make sure that you have enough space on the disk before you start the application. The file has a size of 8.8 GB.


Decompress

The file we downloaded is a 7z archive. Java does not have built-in support for this file format, but fortunately for us, the common-compress project from Apache contains support for this format.

    <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-compress</artifactId>
      <version>1.16.1</version>
    </dependency>
    <dependency>
      <groupId>org.tukaani</groupId>
      <artifactId>xz</artifactId>
      <version>1.8</version>
    </dependency>

pom.xml

    Path dir = Paths.get("e:/temp");
    Path passwordFile = dir.resolve(Paths.get("pwned-passwords-2.0.txt.7z"));
    try (SevenZFile sevenZFile = new SevenZFile(passwordFile.toFile())) {
      SevenZArchiveEntry entry = sevenZFile.getNextEntry();
      byte[] buffer = new byte[65536];
      while (entry != null) {
        Path f = dir.resolve(Paths.get(entry.getName()));
        try (OutputStream os = Files.newOutputStream(f)) {
          int count;
          while ((count = sevenZFile.read(buffer, 0, buffer.length)) > -1) {
            os.write(buffer, 0, count);
          }
        }
        entry = sevenZFile.getNextEntry();
      }
    }

Decompress.java

7z is, like zip, a container format and can contain multiple files. This archive only contains one entry, a text file with the name pwned-passwords-2.0.txt. The decompressed file occupies about 29.4 GB space on the SSD or hard disk. The decompress operation can take a while, on my computer this runs for about 19 minutes.

At this point you can delete the compressed file from the first step (pwned-passwords-2.0.txt.7z) if you run low on disk space.


Import

The file we downloaded and decompressed is a text file and contains a list of SHA-1 hashes. After the hash follows a colon (:) and the count of how many times the password has been seen in all the data breaches.

7C4A8D09CA3762AF61E59520943DC26494F8941B:20760336
F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7016669
B1B3773A05C0ED0176787A4F1574FF0075F7521E:3599486
5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3303003
3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:2900049
7C222FB2927D828AF22F592134E8932480637C0D:2680521

In case you are wondering what the first hash is with a count of over 20 millions. It's 123456.

We will store the data in a Xodus database, a transactional schema-less embedded database from JetBrains, the developers of IntelliJ and Kotlin.

    <dependency>
      <groupId>org.jetbrains.xodus</groupId>
      <artifactId>xodus-environment</artifactId>
      <version>1.2.1</version>
    </dependency>

pom.xml

Xodus is written in Java and Kotlin and can be embedded into any Java application. It is a key/value store and a good fit for this use case. We will use the password SHA-1 hash as the key, because in the last step we want to search for passwords, and as value we store the count.

Xodus organizes the data in stores inside environments. Every environment can hold multiple stores, we only need one and call it "passwords".

For reading the file we use the Files.lines() method that reads the file line by line into the application. Every database operation with Xodus needs to run inside a transaction, but because we import a lot of data inside one transaction the application flushes all pending changes after importing 10.5 million entries. The import did not work without this flush and got very slow after a while.

    final long totalLines = 501636842L;

    try (Environment env = Environments.newInstance("e:/temp/pwnd")) {
      env.executeInTransaction((@NotNull final Transaction txn) -> {
        Store store = env.openStore("passwords", StoreConfig.WITHOUT_DUPLICATES, txn);
        Path inputFile = Paths.get("E:/temp/pwned-passwords-2.0.txt");
        try {
          AtomicLong round = new AtomicLong();
          AtomicLong counter = new AtomicLong();
          Files.lines(inputFile).forEach(line -> {
            long c = counter.incrementAndGet();
            if (c > 10_500_000) {
              txn.flush();
              System.out.printf("Imported: %3.1f%% %n",
                  (double) (round.incrementAndGet() * 10_500_000) / (double) totalLines
                      * 100.0d);
              counter.set(0);
            }
            handleLine(store, txn, line);
          });

          txn.commit();
        }
        catch (IOException e) {
          e.printStackTrace();
        }
      });
    }

Importer.java

Everything an application wants to store in Xodus needs to be a ByteIterable, that's a combination of a byte array and an iterable. The Xodus library provides convenience methods and classes that convert from common Java types to ByteIterable.

The SHA-1 hash in the input file is a 40 character hex string, but because Xodus is able to work with byte arrays we can save space and convert the hex string into a byte array that only occupies 20 bytes and then store that as the key.

  static void handleLine(Store store, Transaction txn, String line) {
    String sha1 = line.substring(0, 40);
    int count = Integer.parseInt(line.substring(41).trim());

    ByteIterable key = new ArrayByteIterable(hexStringToByteArray(sha1));
    store.put(txn, key, IntegerBinding.intToCompressedEntry(count));
  }

  private static byte[] hexStringToByteArray(String s) {
    byte[] data = new byte[20];
    for (int i = 0; i < 40; i += 2) {
      data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
          + Character.digit(s.charAt(i + 1), 16));
    }
    return data;
  }

Importer.java

The database occupies about 18.1 GB of space on the disk. The import can run for a while, on my computer this task takes about 32 minutes.


Search

The final application is the search that looks if a given password is stored in the database.

First we need a SHA-1 encoder, that converts a plaintext password into a hash. Java has built-in support for that with the MessageDigest class.

  private static MessageDigest md;
  static {
    try {
      md = MessageDigest.getInstance("SHA-1");
    }
    catch (NoSuchAlgorithmException e) {
      e.printStackTrace();
    }
  }

Search.java

The method that queries for the password takes the Environment, that is a Xodus class and represents the database, and the plain text password. As mentioned before, every database operation needs to run inside a transaction, because the method only reads data from the database, we start a read only transaction. The method then calculates the SHA-1 with the MessageDigest and queries the database with store.get(), which returns the value, in our case the counter, if the key exists or null if it does not.


  private static Integer haveIBeenPwned(Environment env, String password) {
    return env.computeInReadonlyTransaction(txn -> {
      Store store = env.openStore("passwords", StoreConfig.WITHOUT_DUPLICATES, txn);
      byte[] passwordBytes = md.digest(password.getBytes());
      ByteIterable key = new ArrayByteIterable(passwordBytes);
      ByteIterable bi = store.get(txn, key);
      if (bi != null) {
        return IntegerBinding.compressedEntryToInt(bi);
      }
      return null;
    });

Search.java

In the main method we instantiate the Xodus Environment and point it to the location where the database is stored on disk. Then it iterates through a few example passwords and checks if they exist in the database and prints out their count.

  public static void main(String[] args) {
    try (Environment env = Environments.newInstance("e:/temp/pwnd")) {
      for (String pw : Arrays.asList("123456", "password", "654321", "qwerty",
          "letmein")) {
        long start = System.currentTimeMillis();
        Integer count = haveIBeenPwned(env, pw);
        if (count != null) {
          System.out.println("I have been pwned. Number of occurrences: " + count);
        }
        else {
          System.out.println("Password not found");
        }
        System.out.println(System.currentTimeMillis() - start + " ms");
        System.out.println();
      }
    }

Search.java

On my computer one search operation needs between 1 and 100 milliseconds to look for a hash.

This concludes our excursion. We could accomplish every task in Java and are now able to query the Have I been Pwned database locally on our computer, although it needed a bit of time and a lot of disk space.

You find the complete source code on GitHub:
https://github.com/ralscha/blog/tree/master/pwnd