I’ve recently been playing around with the program Mendeley for storing my massive collection of academic papers in PDF format. Mendeley looks to be a really useful bit of software, but at the moment it’s rather horrifically buggy. A major problem I’ve been running into is that it’s quite happy to import duplicate PDFs. This lead to much amusement when I set Mendeley to watch my collection of papers, and it decided to parse and import all of the papers every time it started up. Before long, Mendeley was trying to extract meta-data for ~20,000 PDFs…
Cleaning the dupes out isn’t too hard. Here’s how I did it.
- Close Mendeley
- Find your Mendeley data directory. On the mac, it’s “~/Library/Application Support/Mendeley Desktop”.
- Find the SQLite database in that directory. It’ll be named something like your-email-address\@www.mendeley.com.sqlite
- Make a backup! (replace “your-email-address\@www.mendeley.com.sqlite” with your database file)
cp your-email-address\@www.mendeley.com.sqlite backup.sqlite
- Access the database:
sqlite3 your-email-address\@www.mendeley.com.sqlite
- You should now be at the SQLite prompt.
- At the prompt, type;
SELECT COUNT(*) as entries, title, year FROM Documents GROUP BY title, year HAVING entries > 1;
… and you’ll get a list of the entries in your database that have the same paper title and paper year.
- If this looks ok, we can delete the duplicates:
DELETE FROM Documents WHERE id NOT IN (SELECT MAX(id) FROM Documents GROUP BY title,year);
- and then do a bit of tidying up to clean up all the empty space:
VACUUM;
- Restart Mendeley, and cross your fingers, and hope it worked!
If it doesn’t work, or you lose papers you didn’t want to, then you can copy the backup file (backup.sqlite) over the database file and restart again. Hopefully, the Mendeley developers will implement a better way of doing this soon, but until then – use this at your own risk!
Hello Simon,
Sorry to hear about the problem – could you confirm which version of Mendeley you are using? (You can check via ‘Mendeley Desktop’ -> ‘About Mendeley Desktop’ in the menu). Can I also check whether Mendeley tries to re-import files on every startup or only in certain circumstances (eg. clicking the Recover button in the Crash Recovery dialog when restarting Mendeley after a crash)?
Mendeley Desktop should be checking for duplicates in several ways during import but evidently something has gone wrong. I should clarify what is supposed to happen here:
1. When Mendeley imports a file via a watched file it records the filename in the MonitoredFiles table of a sqlite database called monitor.sqlite in the Mendeley data directory. When Mendeley starts up and scans your watched folders, it should skip any files whose paths are already in that database.
This file does currently get deleted though if you click ‘Recover’ if prompted by a crash recovery dialog – the idea being that if the program crashed when importing a particular PDF or batch of PDFs it should avoid trying to re-do the import straight away on the next startup.
2. When you import a file via any means (eg. drag-and-drop, File -> Add Files) Mendeley will check if that file is already associated with any of the existing documents in your library – in which case you’ll still see the ‘Importing X files’ message in the status bar but it should just skip that document once it realises that the files are the same.
3. Finally there is a slightly fuzzy metadata comparison which compares the imported metadata against those of articles already in your library for duplicates.
Thanks for the feedback – you can get in touch via support@mendeley.com (or my address here) if you have any questions.