Google Photos Takeout Hell: Fix Meta & Duplicates

Why I Escaped Google Photos #

I have been trying for a while to escape the Google ecosystem. In fact, it’s easier than it seems. It becomes more difficult when you realise how many services you use and how much data you provide them with. Some time ago, however, I reached a point where I decided to transfer my largest data set: my images, sadly locatSed on my Google Photos account. I was shocked to realise that getting started was the least of my worries.

Sometimes beginnings are hard. #

The beginning was not scary. First, I had to download my data from Google’s servers. For this, I used the Google Takeout tool (takeout.google.com). I deselected all options except Google Photos. It’s important to do this because your Takeout will be generated, and the time you have to wait depends on how many files it has to prepare for you from how many sources. In my case, it took a couple of hours for about 45 GB. Once I had unpacked my archive file, I quickly scanned it and noticed that all the metadata for each picture had been saved in separate file. Why is this a problem? I don’t know how you store your pictures, but for me, it’s important to be able to sort by creation date. Otherwise, the date of the Takeout is used, which is incorrect!

cd /PATH/TO/WHERE/YOU/SAVED/TAKEOUT_ARCHIVES
tar -xzf TAKEOUT_ARCHIVE_1.tgz
# repeat for TAKEOUT_ARCHIVE_2.tgz, etc.

# Inside you should see something like:
# Google Photos/

Okay, I took a deep breath and started to fix it.

Let’s get it together! #

Thank goodness for open source! It’s no surprise that someone has had the same problem before and found a solution. I have found a great tool on GitHub by garzj. It’s a simple command-line interface tool that combines our photos into one consistent and correct file. Bingo!

git clone https://github.com/garzj/google-photos-migrate.git
cd google-photos-migrate
docker build -t gpm .

I’ve decided to run this repository inside a Docker container. I don’t know why, but this has been the standard way of running scripts for a long time. After waiting a couple of minutes, I had everything up and running — voilà! In my case, around 87–88% of the files were processed successfully; the rest were mostly motion clips without JSON, obvious duplicates or screenshots with no real metadata. I accepted that some video files would remain with incorrect metadata and didn’t investigate further.

mkdir -p /PATH/TO/OUTPUT_DIR /PATH/TO/ERROR_DIR

docker run --rm -it --security-opt=label=disable \
  -v /PATH/TO/GOOGLE_PHOTOS_EXPORT:/data \
  -v /PATH/TO/OUTPUT_DIR:/output \
  -v /PATH/TO/ERROR_DIR:/error \
  gpm full '/data/Google Photos' '/output' '/error' --timeout 120000

The Empire Strikes Back #

I was so happy that everything went smoothly! I have started uploading my photos to their new location. Firstly, I started with the albums, which are in separate folders. Next, I plan to upload the rest. You wouldn’t believe how frustrated I was when I realised that the pictures in the albums had been duplicated in the shared folder. Now would be a good time to explain the structure of the Google Takeout archive.

├── Photos/          # Main library: originals + edits, with fixed metadata
├── Untitled 1/      # Duplicates from album 1
├── Untitled 2/      # Duplicates from album 2
├── Holidays 2023/   # Album copies (often duplicates of Photos/)
├── Instagram/       # Shared / social duplicates
└── error/           # Files that failed migration

At this point, I had two options: Either I could upload only the ‘Photos’ folder and rebuild the albums manually in the new tool, or I could upload everything and do the same. Meh! I chose the second option, removing the duplicates from the ‘Photos’ folder while preserving the albums as much as possible. Having used Linux since Fedora 14, I am familiar with Bash, so I prepared the necessary script and i run it inside docker container and nobody knows why…

mkdir -p /PATH/TO/OUTPUT_CLEAN
cp -r /PATH/TO/OUTPUT_DIR /PATH/TO/OUTPUT_CLEAN
cd /PATH/TO/OUTPUT_CLEAN/OUTPUT_DIR
docker run -it --rm -v /PATH/TO/OUTPUT_CLEAN:/data alpine:latest sh -c "
  apk add --no-cache rclone &&
  cd /data/OUTPUT_DIR &&
  echo '=== 1/4 START: ' \$(find . -type f | wc -l) 'files ===' &&
  echo 'Root preview:' &&
  ls -la | head -10 &&

  echo '=== 2/4 Dedupe EVERYTHING --by-hash newest ===' &&
  rclone dedupe . --by-hash --dedupe-mode newest || echo 'No global duplicates found' &&
  echo 'After global dedupe: ' \$(find . -type f | wc -l) 'files' &&

  echo '=== 3/4 Dedupe Photos* --skip (should be clean) ===' &&
  rclone dedupe *Photos* --by-hash --dedupe-mode skip 2>/dev/null || echo 'Photos already clean or empty' &&
  echo 'Photos count: ' \$(find *Photos* -type f 2>/dev/null | wc -l || echo 0) &&

  echo '=== 4/4 Dedupe Untitled* --skip (keep album structure) ===' &&
  rclone dedupe Untitled* --by-hash --dedupe-mode skip 2>/dev/null || echo 'Albums OK' &&
  echo 'Album files: ' \$(find Untitled* -type f 2>/dev/null | wc -l || echo 0) &&

  echo '=== RESULT: ' \$(find . -type f | wc -l) 'files, total size ' \$(du -sh . | cut -f1) ' ===' &&
  ls -lhS *Photos* Untitled* 2>/dev/null | head -10 || echo 'Structure looks fine'
"

This script does four things in one go: it shows how many files you start with, removes pure binary duplicates globally, sanity-checks the ‘Photos’ folder and gently cleans album folders without breaking their structure. In the end, I was left with a single, deduplicated library of about 12–15k files, ready for upload, instead of a tangled mess of overlapping copies.

Yes, well done! #

That’s it. I now have a clean, pure structure with valid meta tags in all the files. Uploading is easy now. All the photos are in the ‘Albums’ folder, and the rest are in the ‘Photos’ folder. It wasn’t the easiest process, and I can see how challenging it could be, particularly for non-tech users. However, I believe my post could be useful for someone in the middle of a breakdown while trying to move out of Google with their own photos.