Idea: Comprehensive testing resources project

Any other tech-related topics
Message
Author
User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Idea: Comprehensive testing resources project

#1 Post by webfork »

Sometime back I had a brainstorm for what I think would be an enormously useful resource for a number of software people, including reviewers and bug testers like myself.

In short, having a consistent set of descriptive, clearly-named files and a variety of content could serve as a way to:
  • Test claims of compatibility with publicly available examples. I'm constantly finding issues with files at work that I can't share. If I could find a public repository of similar files and point developers to it, that would take down a barrier to use.
  • General bug testing - Give the tools to check for bugs that would normally only appear after weeks and months of use.
  • Benchmarking - Could be used to test compression / transfer tools thanks to a wide variety of data
This could all be enabled by creating a host of testing resources that are clearly labeled, easy to reference, and most importantly public-facing.

So where's this resource?

Unfortunately, it wasn't long into working on this that I realized I'd bit off WAY more than I could chew. I could easily spend 10 hours a week for the next year and only take a bite out out of what would be a very useful set of testing resources.

Also, it's entirely possible someone's already built something along these lines. In any case, I'm posting it here in the event that someone else can either give suggestions on how to proceed, point me to someone that's already doing it, or take the idea and run with it. I may still pick this project up and move forward with it if there's time and interest, but for now I'll just leave this here.

Components

Potentially useful file types
Various data that could help in testing.
  • Text files that are slightly different (minor word differences), somewhat different (moved data), and very different with only a few similarities. Also: content that's dramatically rearranged (all paragraphs in a different order)
  • Wide variety of audio formats with numerous tags applied
  • Wide variety of new and old video files
  • Photography with various different metadata
  • Files that are misnamed or mislabeled with incorrect content (should have a set that are just completely wrong names)
  • Old and new filetypes (e.g. Word 6.0, Word 2003, Word 2016, etc.)
  • Old filetypes e.g. Lotus 123 and BMP files
  • Unusual compression files (e.g. .xz)
  • Hashed files of a set folder/directory (e.g. .SFV)
  • High compression and low compression files
  • password protection (old doc files, docx, PDF, etc.)
  • Write-protected files
  • Generated by HTML tools e.g. Word, Dreamweaver, Kompozer, LibreOffice, etc)
  • Text files with various types of generated data including phone numbers, SSNs, addresses,
  • PDFs - various formats, with layered graphics, without, hidden, etc.
  • Non-latin, asian characters, etc.
File content
Some of the things we'd try to include in every file (where possible)
  • Some standard text explaining where it came from and why (maybe with the project intro)
  • Some note about what the file is and where it was used, and why it might be interesting to test (this would probably be the meat of the effort)
  • A block of unique information (probably a paragraph of some generated text content)
---

Other possible benefits
Some other ways this resource could come in handy:
  • Check whether indexing search programs are able to find a given tool.
  • A variety of commercial software dumps anything resembling support for old versions of it's files (MS Works is the worst about this).
  • Compression tools get better over time, sure - but what about testing a variety of file types with your compression program and seeing how it's improved?
  • Developers frequently add some kind of toolset to open or modify a given file type and don't really know how to underline this fact. Yet it can be a lifesaver for the right person.
  • Create screencaps that feature our site
---

Related

A great example of testing resources: all the important details are right in the file name: http://download.opencontent.netflix.com ... mera/AVIF/

billon
Posts: 843
Joined: Sat Jun 23, 2012 4:28 pm

Re: Idea: Comprehensive testing resources project

#2 Post by billon »

webfork wrote: Sat May 23, 2020 9:27 am
  • Wide variety of audio formats with numerous tags applied
  • Wide variety of new and old video files
https://samples.ffmpeg.org/

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Idea: Comprehensive testing resources project

#3 Post by Midas »




User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Idea: Comprehensive testing resources project

#6 Post by Midas »


Their format listing is pretty handy, too: https://www.online-convert.com/file-type.

User avatar
vevy
Posts: 795
Joined: Tue Sep 10, 2019 11:17 am

Re: Idea: Comprehensive testing resources project

#7 Post by vevy »

Compression testing: http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia (Mentioned here).

Also sfk zip has test files.

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Idea: Comprehensive testing resources project

#8 Post by webfork »

So this is just one piece of the testing puzzle I hope to one day create: test files to use in comparison software like WinMerge.

---

As you may already know, not all text comparison tools are equal. I came up with a file to help demonstrate what things that comparison tools call out and what they're missing.

For example, a lot of tools can be configured to ignore whitespace but some only target full lines or whole words rather than an individual character. One of my favorite my favorite comparison tools, an add-on for Notepad++:(https://github.com/jsleroy/compare-plugin) is very at catching content moves where most tools only list add/change/remove.

Image

Conversely, here's what RJ TextEd will put out a it's difference report about the two files:

Image

Both files used in these comparisons are available here for download below:

Old: https://pastebin.com/TPKmhjxa
New: https://pastebin.com/RFfVaebk

Feedback welcome.

TP109
Posts: 571
Joined: Sat Apr 08, 2006 7:12 pm
Location: Midwestern US

Re: Idea: Comprehensive testing resources project

#9 Post by TP109 »

webfork wrote: Sat Jul 11, 2020 7:44 pm New: https://pastebin.com/RFfVaebk
I get this message from the above link:
This page is no longer available. It has either expired, been removed by its creator, or removed by one of the Pastebin staff.

User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Idea: Comprehensive testing resources project

#10 Post by Midas »

TP109 wrote:This page is no longer available.

Yep, after it went all-out commercial, I noticed Pastebin becoming less and less reliable over time. There are plenty of alternatives, but old habits die hard. :neutral_face:

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Idea: Comprehensive testing resources project

#11 Post by webfork »

TP109 wrote: Sat Jul 11, 2020 10:35 pm This page is no longer available.
Thanks, I'll get that sorted this week and update.

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Idea: Comprehensive testing resources project

#12 Post by webfork »

I'll just post them here with a small font:

---

OLD FILE

This file is: a way to test old and new files for text comparison software e.g. WinMerge. It's not useful without the "new" file as well.

---

This line has just one word that is different

This line has two spaces between words

This line has a number that has one digit that is different in the new version 4.94

This line has been moved to the top of the page in the new version

This line is a duplicate of a line elsewhere in the file

This line is in all caps in the new file

This line is in Proper Case in the new file

ThisSentenceUsesCamelCase which has been fixed in the new file

This line is a duplicate of a line elsewhere in the file

This line is very similar to another line elsewhere in the file except for this text

This statement
is not wrapped correctly but is fixed in the new file

This line has a lot of white space between it and the next line (this has been fixed in the new version)




This line is a duplicate of a line elsewhere in the file

This line haas missspelled werds

This line is very similar to another line elsewhere in the file


NEW FILE

This file is: a way to test old and new files for text comparison software e.g. WinMerge. It's not useful without the "new" file as well.

---

This line has been moved to the top of the page in the new version

This line has just one word that is unique

This line has two spaces between words but has been fixed in the new file

This line has a number that has one digit that is different in the new version 4.96

This line is a duplicate of a line elsewhere in the file

THIS LINE IS IN ALL CAPS IN THE NEW FILE

This Line Is In Proper Case In The New File

This Sentence Uses CamelCase which has been fixed in the new file

This line is a duplicate of a line elsewhere in the file

This line is very similar to another line elsewhere in the file except for this text

This statement is not wrapped correctly but is fixed in the new file

This line has a lot of white space between it and the next line (this has been fixed in the new version)

This line is a duplicate of a line elsewhere in the file

This line haas missspelled werds

This line is very similar to another line elsewhere in the file



User avatar
Midas
Posts: 6710
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: Idea: Comprehensive testing resources project

#15 Post by Midas »

TP109 wrote: Alternatives to pastebin:


:information_source: The last is my preferred because it supports Markdown.

Post Reply