Is there a program that can remove duplicate lines in text files?
Posted: Fri Feb 15, 2019 2:28 am
Is there a program that can remove duplicate lines in text files?
TPFC Forums
https://www.portablefreeware.com/forums/
https://www.portablefreeware.com/forums/viewtopic.php?t=24434
Great question. I've been looking for something like this myself. My big thing was I didn't want it to just automatically remove them first, I needed to have them identified. I have a trick that works with LibreOffice Calc and Excel, but that's fairly complex. So far it's RJ Texted that will bookmark duplicate lines.
Quick and dirty solution(s), ...provided you don't mind a bit of CLI wrestling...
Code: Select all
c:\mytools>uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Discard all but one of successive identical lines from INPUT (or
standard input), writing to OUTPUT (or standard output).
-c, --count prefix lines by the number of occurrences
-d, --repeated only print duplicate lines
-D, --all-repeated print all duplicate lines
-f, --skip-fields=N avoid comparing the first N fields
-i, --ignore-case ignore differences in case when comparing
-s, --skip-chars=N avoid comparing the first N characters
-u, --unique only print unique lines
-w, --check-chars=N compare no more than N characters in lines
-N same as -f N
+N same as -s N
--help display this help and exit
--version output version information and exit
A field is a run of whitespace, then non-whitespace characters.
Fields are skipped before chars.
Report bugs to <bug-textutils@gnu.org>.
Code: Select all
c:\mytools>busybox uniq --help
BusyBox v1.27.0-FRP-1035-g74163a5 (2017-02-09 08:42:39 GMT) multi-call binary.
Usage: uniq [-cdu][-f,s,w N] [INPUT [OUTPUT]]
Discard duplicate lines
-c Prefix lines by the number of occurrences
-d Only print duplicate lines
-u Only print unique lines
-f N Skip first N fields
-s N Skip first N chars (after any skipped fields)
-w N Compare N characters in line
Many thanks, yes, I had found that page, too. Some of those suggestions do not work properly, are not available anymore. And I do not want to to it online.As a result of a quick googling, I found this:
7 Ways To Remove Duplicate Lines in Text Files.
Thank you, will have a look at it.So far it's RJ Texted that will bookmark duplicate lines
Thank you, that works well, just tried it.To just zap them Notepad3 has something in the menu bar: Edit - Lines - Remove Duplicate lines.
Many thanks, but I somehow have some concenring about such online services.EDIT: if you need something quick, there's https://www.textevo.com/
Thank you for the link.From where I stand what you're looking for is a specialized kind of software generally called concordancers (https://en.wikipedia.org/wiki/Concordancer). A decade back I would have some ready suggestions for you but too much time has passed since.
Thank you, "Released on 29 Nov 2013", but it will work anyway, I guess.EverEdit also has a command for this - Edit - Delete - Delete Duplicated Lineshttps://www.portablefreeware.com/index.php?id=2538
@uwotm8
Code: Select all
C:\mytools\cat test-in.txt
EenyMeenyMinyMoe-0
EenyMeenyMinyMoe-1
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-Z
EenyMeenyMinyMoe-Z
Code: Select all
C:\mytools>uniq test-in.txt
EenyMeenyMinyMoe-0
EenyMeenyMinyMoe-1
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-Z
Code: Select all
C:\mytools\uniq test-in.txt > test-out.txt