Idea: Comprehensive testing resources project

Message

#1 Post by **webfork** » Sat May 23, 2020 9:27 am

Sometime back I had a brainstorm for what I think would be an enormously useful resource for a number of software people, including reviewers and bug testers like myself.

In short, having a consistent set of descriptive, clearly-named files and a variety of content could serve as a way to:

Test claims of compatibility with publicly available examples. I'm constantly finding issues with files at work that I can't share. If I could find a public repository of similar files and point developers to it, that would take down a barrier to use.

General bug testing - Give the tools to check for bugs that would normally only appear after weeks and months of use.

Benchmarking - Could be used to test compression / transfer tools thanks to a wide variety of data

This could all be enabled by creating a host of testing resources that are clearly labeled, easy to reference, and most importantly public-facing.

So where's this resource?

Unfortunately, it wasn't long into working on this that I realized I'd bit off WAY more than I could chew. I could easily spend 10 hours a week for the next year and only take a bite out out of what would be a very useful set of testing resources.

Also, it's entirely possible someone's already built something along these lines. In any case, I'm posting it here in the event that someone else can either give suggestions on how to proceed, point me to someone that's already doing it, or take the idea and run with it. I may still pick this project up and move forward with it if there's time and interest, but for now I'll just leave this here.

Components

Potentially useful file types
Various data that could help in testing.

Text files that are slightly different (minor word differences), somewhat different (moved data), and very different with only a few similarities. Also: content that's dramatically rearranged (all paragraphs in a different order)

Wide variety of audio formats with numerous tags applied

Wide variety of new and old video files

Photography with various different metadata

Files that are misnamed or mislabeled with incorrect content (should have a set that are just completely wrong names)

Old and new filetypes (e.g. Word 6.0, Word 2003, Word 2016, etc.)

Old filetypes e.g. Lotus 123 and BMP files

Unusual compression files (e.g. .xz)

Hashed files of a set folder/directory (e.g. .SFV)

High compression and low compression files

password protection (old doc files, docx, PDF, etc.)

Write-protected files

Generated by HTML tools e.g. Word, Dreamweaver, Kompozer, LibreOffice, etc)

Text files with various types of generated data including phone numbers, SSNs, addresses,

PDFs - various formats, with layered graphics, without, hidden, etc.

Non-latin, asian characters, etc.

File content
Some of the things we'd try to include in every file (where possible)

Some standard text explaining where it came from and why (maybe with the project intro)

Some note about what the file is and where it was used, and why it might be interesting to test (this would probably be the meat of the effort)

A block of unique information (probably a paragraph of some generated text content)

---

Other possible benefits
Some other ways this resource could come in handy:

Check whether indexing search programs are able to find a given tool.

A variety of commercial software dumps anything resembling support for old versions of it's files (MS Works is the worst about this).

Compression tools get better over time, sure - but what about testing a variety of file types with your compression program and seeing how it's improved?

Developers frequently add some kind of toolset to open or modify a given file type and don't really know how to underline this fact. Yet it can be a lifesaver for the right person.

Create screencaps that feature our site

---

Related

A great example of testing resources: all the important details are right in the file name: http://download.opencontent.netflix.com ... mera/AVIF/

billon · #2 Post by **billon** » Sat May 23, 2020 11:10 am

webfork wrote: ↑Sat May 23, 2020 9:27 am

Wide variety of audio formats with numerous tags applied

Wide variety of new and old video files

https://samples.ffmpeg.org/

#3 Post by **Midas** » Sat May 23, 2020 1:28 pm

Only marginally related (I'm sure there are files somewhere, I just don't know where) regarding digital audio:

wiki.hydrogenaud.io/index.php?title=Hydrogenaudio_Listening_Tests

wiki.hydrogenaud.io/index.php?title=ABX

wiki.hydrogenaud.io/index.php?title=Audio_format_guide

wiki.hydrogenaud.io/index.php?title=Lossless_comparison

billon · #4 Post by **billon** » Sat May 23, 2020 3:00 pm

https://example-files.online-convert.com/

#5 Post by **webfork** » Sat May 23, 2020 5:30 pm

billon wrote: ↑Sat May 23, 2020 3:00 pm https://example-files.online-convert.com/
https://samples.ffmpeg.org/

Nice, thanks

#6 Post by **Midas** » Sun May 24, 2020 5:33 am

billon wrote: ↑https://example-files.online-convert.com/

Their format listing is pretty handy, too: https://www.online-convert.com/file-type.

vevy · #7 Post by **vevy** » Sun Jun 14, 2020 2:34 pm

Compression testing: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia (Mentioned here).

Also sfk zip has test files.

#8 Post by **webfork** » Sat Jul 11, 2020 7:44 pm

So this is just one piece of the testing puzzle I hope to one day create: test files to use in comparison software like WinMerge.

---

As you may already know, not all text comparison tools are equal. I came up with a file to help demonstrate what things that comparison tools call out and what they're missing.

For example, a lot of tools can be configured to ignore whitespace but some only target full lines or whole words rather than an individual character. One of my favorite my favorite comparison tools, an add-on for Notepad++:(https://github.com/jsleroy/compare-plugin) is very at catching content moves where most tools only list add/change/remove.

Conversely, here's what RJ TextEd will put out a it's difference report about the two files:

Both files used in these comparisons are available here for download below:

Old: https://pastebin.com/TPKmhjxa
New: https://pastebin.com/RFfVaebk

Feedback welcome.

TP109 · #9 Post by **TP109** » Sat Jul 11, 2020 10:35 pm

webfork wrote: ↑Sat Jul 11, 2020 7:44 pm New: https://pastebin.com/RFfVaebk

I get this message from the above link:
This page is no longer available. It has either expired, been removed by its creator, or removed by one of the Pastebin staff.

#10 Post by **Midas** » Sun Jul 12, 2020 6:49 am

TP109 wrote:This page is no longer available.

Yep, after it went all-out commercial, I noticed Pastebin becoming less and less reliable over time. There are plenty of alternatives, but old habits die hard.

#11 Post by **webfork** » Sun Jul 12, 2020 5:59 pm

TP109 wrote: ↑Sat Jul 11, 2020 10:35 pm This page is no longer available.

Thanks, I'll get that sorted this week and update.

#12 Post by **webfork** » Fri Jul 17, 2020 8:07 pm

I'll just post them here with a small font:

---

OLD FILE

This file is: a way to test old and new files for text comparison software e.g. WinMerge. It's not useful without the "new" file as well.

---

This line has just one word that is different

This line has two spaces between words

This line has a number that has one digit that is different in the new version 4.94

This line has been moved to the top of the page in the new version

This line is a duplicate of a line elsewhere in the file

This line is in all caps in the new file

This line is in Proper Case in the new file

ThisSentenceUsesCamelCase which has been fixed in the new file

This line is a duplicate of a line elsewhere in the file

This line is very similar to another line elsewhere in the file except for this text

This statement
is not wrapped correctly but is fixed in the new file

This line has a lot of white space between it and the next line (this has been fixed in the new version)

This line is a duplicate of a line elsewhere in the file

This line haas missspelled werds

This line is very similar to another line elsewhere in the file

NEW FILE

This file is: a way to test old and new files for text comparison software e.g. WinMerge. It's not useful without the "new" file as well.

---

This line has been moved to the top of the page in the new version

This line has just one word that is unique

This line has two spaces between words but has been fixed in the new file

This line has a number that has one digit that is different in the new version 4.96

This line is a duplicate of a line elsewhere in the file

THIS LINE IS IN ALL CAPS IN THE NEW FILE

This Line Is In Proper Case In The New File

This Sentence Uses CamelCase which has been fixed in the new file

This line is a duplicate of a line elsewhere in the file

This line is very similar to another line elsewhere in the file except for this text

This statement is not wrapped correctly but is fixed in the new file

This line has a lot of white space between it and the next line (this has been fixed in the new version)

This line is a duplicate of a line elsewhere in the file

This line haas missspelled werds

This line is very similar to another line elsewhere in the file

TP109 · #13 Post by **TP109** » Fri Jul 17, 2020 10:24 pm

Alternatives to pastebin:
https://alternativeto.net/software/pastebin/
https://www.makeuseof.com/tag/4-alterna ... -pastebin/

OLDFILE:
https://textbin.net/0HUzxUU73q
NEWFILE:
https://textbin.net/nDvPfJieDy

vevy · #14 Post by **vevy** » Mon Sep 28, 2020 5:18 pm

https://github.com/N6UDP/binfile
https://github.com/nmoinvaz/corpora

#15 Post by **Midas** » Tue Sep 29, 2020 3:11 am

TP109 wrote: ↑ Alternatives to pastebin:

https://hastebin.com/

https://hasteb.in/

https://rentry.co/

The last is my preferred because it supports Markdown.

The Portable Freeware Collection Forums

Idea: Comprehensive testing resources project

Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project

Re: Idea: Comprehensive testing resources project