Is there a program that can remove duplicate lines in text files?

Discuss anything related to portable freeware here.
Post Reply
Message
Author
uwotm8
Posts: 30
Joined: Mon Mar 23, 2015 2:30 am

Is there a program that can remove duplicate lines in text files?

#1 Post by uwotm8 »

Is there a program that can remove duplicate lines in text files?

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Is there a program that can remove duplicate lines in text files?

#2 Post by webfork »

uwotm8 wrote: Fri Feb 15, 2019 2:28 am Is there a program that can remove duplicate lines in text files?
Great question. I've been looking for something like this myself. My big thing was I didn't want it to just automatically remove them first, I needed to have them identified. I have a trick that works with LibreOffice Calc and Excel, but that's fairly complex. So far it's RJ Texted that will bookmark duplicate lines.

To just zap them Notepad3 has something in the menu bar: Edit - Lines - Remove Duplicate lines. Interestingly, DocPad (not portable) says it will discard duplicate *paragraphs* which I haven't tested but sounds cool.

EDIT: if you need something quick, there's https://www.textevo.com/

Related thread: Finding duplicate phrases

User avatar
__philippe
Posts: 687
Joined: Wed Jun 26, 2013 2:09 am

Re: Is there a program that can remove duplicate lines in text files?

#3 Post by __philippe »

uwotm8 wrote: Fri Feb 15, 2019 2:28 am Is there a program that can remove duplicate lines in text files?
Quick and dirty solution(s), ...provided you don't mind a bit of CLI wrestling... :wink:

Uniq
(part of unxutils suite)

Code: Select all

c:\mytools>uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Discard all but one of successive identical lines from INPUT (or
standard input), writing to OUTPUT (or standard output).

  -c, --count           prefix lines by the number of occurrences
  -d, --repeated        only print duplicate lines
  -D, --all-repeated    print all duplicate lines
  -f, --skip-fields=N   avoid comparing the first N fields
  -i, --ignore-case     ignore differences in case when comparing
  -s, --skip-chars=N    avoid comparing the first N characters
  -u, --unique          only print unique lines
  -w, --check-chars=N   compare no more than N characters in lines
  -N                    same as -f N
  +N                    same as -s N
      --help            display this help and exit
      --version         output version information and exit

A field is a run of whitespace, then non-whitespace characters.
Fields are skipped before chars.

Report bugs to <bug-textutils@gnu.org>.
OR

Uniq
(part of BusyBox)

Code: Select all

c:\mytools>busybox uniq --help
BusyBox v1.27.0-FRP-1035-g74163a5 (2017-02-09 08:42:39 GMT) multi-call binary.

Usage: uniq [-cdu][-f,s,w N] [INPUT [OUTPUT]]

Discard duplicate lines

        -c      Prefix lines by the number of occurrences
        -d      Only print duplicate lines
        -u      Only print unique lines
        -f N    Skip first N fields
        -s N    Skip first N chars (after any skipped fields)
        -w N    Compare N characters in line

User avatar
tproli
Posts: 1172
Joined: Sat Sep 09, 2006 10:14 am
Location: Hungary
Contact:

Re: Is there a program that can remove duplicate lines in text files?

#4 Post by tproli »

EverEdit also has a command for this - Edit - Delete - Delete Duplicated Lines
https://www.portablefreeware.com/index.php?id=2538

uwotm8
Posts: 30
Joined: Mon Mar 23, 2015 2:30 am

Re: Is there a program that can remove duplicate lines in text files?

#5 Post by uwotm8 »

As a result of a quick googling, I found this:
7 Ways To Remove Duplicate Lines in Text Files.
Many thanks, yes, I had found that page, too. Some of those suggestions do not work properly, are not available anymore. And I do not want to to it online.
So far it's RJ Texted that will bookmark duplicate lines
Thank you, will have a look at it.
To just zap them Notepad3 has something in the menu bar: Edit - Lines - Remove Duplicate lines.
Thank you, that works well, just tried it.
EDIT: if you need something quick, there's https://www.textevo.com/
Many thanks, but I somehow have some concenring about such online services.
From where I stand what you're looking for is a specialized kind of software generally called concordancers (https://en.wikipedia.org/wiki/Concordancer). A decade back I would have some ready suggestions for you but too much time has passed since.
Thank you for the link.

Thank you very much, philippe,

That for me looks a bit complicated somehow.
EverEdit also has a command for this - Edit - Delete - Delete Duplicated Lineshttps://www.portablefreeware.com/index.php?id=2538
Thank you, "Released on 29 Nov 2013", but it will work anyway, I guess.

User avatar
__philippe
Posts: 687
Joined: Wed Jun 26, 2013 2:09 am

Re: Is there a program that can remove duplicate lines in text files?

#6 Post by __philippe »

uwotm8 wrote: Sun Feb 17, 2019 3:09 pm Thank you very much, philippe,
That for me looks a bit complicated somehow.
@uwotm8
Fear not, uniq  basic usage is simple as pie... 8)

Consider, if you will:

Input file: test-in.txt

Code: Select all

C:\mytools\cat test-in.txt
EenyMeenyMinyMoe-0
EenyMeenyMinyMoe-1
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-Z
EenyMeenyMinyMoe-Z
Remove duplicates, send result to console:

Code: Select all

C:\mytools>uniq test-in.txt
EenyMeenyMinyMoe-0
EenyMeenyMinyMoe-1
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-Z
Remove duplicates, send result to output file:

Code: Select all

C:\mytools\uniq test-in.txt > test-out.txt
...and check out uniq 101 to delve deeper into the delights of uniq arcane options...:wink:

uwotm8
Posts: 30
Joined: Mon Mar 23, 2015 2:30 am

Re: Is there a program that can remove duplicate lines in text files?

#7 Post by uwotm8 »

Okay, yes, I understand, very good, works great, many thanks!

User avatar
__philippe
Posts: 687
Joined: Wed Jun 26, 2013 2:09 am

Re: Is there a program that can remove duplicate lines in text files?

#8 Post by __philippe »

Attaboy ! Chalk up another recruit to the CLI clique... :mrgreen:

Post Reply