Page 1 of 1

Is there a program that can remove duplicate lines in text files?

Posted: Fri Feb 15, 2019 2:28 am
by uwotm8
Is there a program that can remove duplicate lines in text files?

Re: Is there a program that can remove duplicate lines in text files?

Posted: Fri Feb 15, 2019 7:07 am
by webfork
uwotm8 wrote: Fri Feb 15, 2019 2:28 am Is there a program that can remove duplicate lines in text files?
Great question. I've been looking for something like this myself. My big thing was I didn't want it to just automatically remove them first, I needed to have them identified. I have a trick that works with LibreOffice Calc and Excel, but that's fairly complex. So far it's RJ Texted that will bookmark duplicate lines.

To just zap them Notepad3 has something in the menu bar: Edit - Lines - Remove Duplicate lines. Interestingly, DocPad (not portable) says it will discard duplicate *paragraphs* which I haven't tested but sounds cool.

EDIT: if you need something quick, there's https://www.textevo.com/

Related thread: Finding duplicate phrases

Re: Is there a program that can remove duplicate lines in text files?

Posted: Fri Feb 15, 2019 12:35 pm
by __philippe
uwotm8 wrote: Fri Feb 15, 2019 2:28 am Is there a program that can remove duplicate lines in text files?
Quick and dirty solution(s), ...provided you don't mind a bit of CLI wrestling... :wink:

Uniq
(part of unxutils suite)

Code: Select all

c:\mytools>uniq --help
Usage: uniq [OPTION]... [INPUT [OUTPUT]]
Discard all but one of successive identical lines from INPUT (or
standard input), writing to OUTPUT (or standard output).

  -c, --count           prefix lines by the number of occurrences
  -d, --repeated        only print duplicate lines
  -D, --all-repeated    print all duplicate lines
  -f, --skip-fields=N   avoid comparing the first N fields
  -i, --ignore-case     ignore differences in case when comparing
  -s, --skip-chars=N    avoid comparing the first N characters
  -u, --unique          only print unique lines
  -w, --check-chars=N   compare no more than N characters in lines
  -N                    same as -f N
  +N                    same as -s N
      --help            display this help and exit
      --version         output version information and exit

A field is a run of whitespace, then non-whitespace characters.
Fields are skipped before chars.

Report bugs to <bug-textutils@gnu.org>.
OR

Uniq
(part of BusyBox)

Code: Select all

c:\mytools>busybox uniq --help
BusyBox v1.27.0-FRP-1035-g74163a5 (2017-02-09 08:42:39 GMT) multi-call binary.

Usage: uniq [-cdu][-f,s,w N] [INPUT [OUTPUT]]

Discard duplicate lines

        -c      Prefix lines by the number of occurrences
        -d      Only print duplicate lines
        -u      Only print unique lines
        -f N    Skip first N fields
        -s N    Skip first N chars (after any skipped fields)
        -w N    Compare N characters in line

Re: Is there a program that can remove duplicate lines in text files?

Posted: Sat Feb 16, 2019 6:30 am
by tproli
EverEdit also has a command for this - Edit - Delete - Delete Duplicated Lines
https://www.portablefreeware.com/index.php?id=2538

Re: Is there a program that can remove duplicate lines in text files?

Posted: Sun Feb 17, 2019 3:09 pm
by uwotm8
As a result of a quick googling, I found this:
7 Ways To Remove Duplicate Lines in Text Files.
Many thanks, yes, I had found that page, too. Some of those suggestions do not work properly, are not available anymore. And I do not want to to it online.
So far it's RJ Texted that will bookmark duplicate lines
Thank you, will have a look at it.
To just zap them Notepad3 has something in the menu bar: Edit - Lines - Remove Duplicate lines.
Thank you, that works well, just tried it.
EDIT: if you need something quick, there's https://www.textevo.com/
Many thanks, but I somehow have some concenring about such online services.
From where I stand what you're looking for is a specialized kind of software generally called concordancers (https://en.wikipedia.org/wiki/Concordancer). A decade back I would have some ready suggestions for you but too much time has passed since.
Thank you for the link.

Thank you very much, philippe,

That for me looks a bit complicated somehow.
EverEdit also has a command for this - Edit - Delete - Delete Duplicated Lineshttps://www.portablefreeware.com/index.php?id=2538
Thank you, "Released on 29 Nov 2013", but it will work anyway, I guess.

Re: Is there a program that can remove duplicate lines in text files?

Posted: Mon Feb 18, 2019 1:39 am
by __philippe
uwotm8 wrote: Sun Feb 17, 2019 3:09 pm Thank you very much, philippe,
That for me looks a bit complicated somehow.
@uwotm8
Fear not, uniq  basic usage is simple as pie... 8)

Consider, if you will:

Input file: test-in.txt

Code: Select all

C:\mytools\cat test-in.txt
EenyMeenyMinyMoe-0
EenyMeenyMinyMoe-1
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-Z
EenyMeenyMinyMoe-Z
Remove duplicates, send result to console:

Code: Select all

C:\mytools>uniq test-in.txt
EenyMeenyMinyMoe-0
EenyMeenyMinyMoe-1
EenyMeenyMinyMoe-2
EenyMeenyMinyMoe-3
EenyMeenyMinyMoe-4
EenyMeenyMinyMoe-A
EenyMeenyMinyMoe-Z
Remove duplicates, send result to output file:

Code: Select all

C:\mytools\uniq test-in.txt > test-out.txt
...and check out uniq 101 to delve deeper into the delights of uniq arcane options...:wink:

Re: Is there a program that can remove duplicate lines in text files?

Posted: Mon Feb 18, 2019 2:54 pm
by uwotm8
Okay, yes, I understand, very good, works great, many thanks!

Re: Is there a program that can remove duplicate lines in text files?

Posted: Mon Feb 18, 2019 6:08 pm
by __philippe
Attaboy ! Chalk up another recruit to the CLI clique... :mrgreen: