2024-04-27T18:18:12
Status: #idea
Tags: #productivity #technology #excel #scientificpublications #dedupe #deduplications
Links: [[Productivity]] | [[Technology]] | [[Excel]] | [[Scientific Publications]] | [[Dedupe]]
# Reference Cleanup/Deduplication Using Excel
The following instructions are intended to help with checking references in scientific publications for possible duplicates. This is a manual process requiring both Word and Excel in tandem and utilizes simple formulas, sorting, and conditional formatting in Excel to identify duplicates.
1. Download the following Excel spreadsheet template: [References Duplicate Checker.xlsx](https://docs.google.com/spreadsheets/d/1YOIiJ-4lEFUW7gUkXeLPQ1qfc7PhCnqz/edit?usp=sharing&ouid=104660245509558814057&rtpof=true&sd=true)
2. Create a copy of the spreadsheet template for each new project
3. Open the Word document with the list of references
4. Select all references in Word (using click + shift-click method from start to finish of the section):
![[WordReferencesExample.png]]
5. Then use ⌘-C to copy on 🍎 (or Ctrl-C on 🪟) to copy all references to the clipboard
6. Switch over to the Excel spreadsheet
7. Click into first cell in the list in Excel that says "1. paste here" (cell A2):
![[ExcelReferencesCheckerTemplate.png]]
8. Now use ⌘-V on 🍎 (or Ctrl-V on 🪟) to paste the contents of all the references:
![[ExcelReferencesCheckerPasted.png]]
9. The formulas in the 2<sup>nd</sup> and 3<sup>rd</sup> column will automatically separate the numbering from the contents of the reference and also clean the data of leading, trailing, or multiple inside spaces. In addition you can already see the first duplicate references being highlighted in red ink, but it is not yet obvious where the other duplicate is to be found.
10. Use the drop-down arrow in the header of the 3<sup>rd</sup> column (circled in red in the screenshot above in #8) to sort the references alphabetically:
![[ExcelReferencesCheckerSorted.png]]
11. At this point, you can now scroll through the list and will find duplicates next to each other and can easily identify their respective numbers (e.g. 5 and 88 in the above screenshot).
12. In addition, it is recommended to scroll through the entire sorted list carefully as a last step and manually check for any almost-duplicates, i.e. those that are not an exact match but should be considered duplicates anyway, because one of the two references might simply contain a typo or extra punctuation characters.
Hopefully this process description and the corresponding Excel template is helpful to some...
# References
- https://support.microsoft.com/en-us/office/create-a-bibliography-citations-and-references-17686589-4824-4940-9c69-342c289fa2a5