CLI Database Discussions

Discuss anything related to command line tools here.
Message
Author
vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#181 Post by vevy » Thu May 21, 2020 7:10 pm

Thanks a lot for your response. It is clear and to the point, which I appreciate. You should try to poke as many holes as you can in what I propose so that we all (including myself) can see if it can hold.

A few quick remark about alternativeto.net:
  • You are a bit harsher on it than I would be! :) From a user's perspective, they are my go-to place for "related" software.
  • Whether it's the tags or some algorithm behind the scene, their related software work more often than not for me, especially when tried in multiple iterations (related, then related of related). I discovered a lot of gems through that system.
  • Also their tags largely have unified terminology; meaning there is only one phrasing used consistently as tag for a particular feature.
  • Our "Similar/alternative apps": is the kind of feature that requires the a whole lot of maintenance and manual work beyond the occasional use, which is what I want to avoid through systematic relations. It also works for largely overlapping apps rather than intersecting ones. It is also kind of cryptic (in what way are they similar?) and subjective on what level of similarity warrants it.
Andrew Lee wrote:
Thu May 21, 2020 3:50 pm
... tries to anticipate all potential search queries.
..."extract video from mpeg without quality loss". We need an index for that.
It's never-ending manual work to maintain the index...
If that is what comes across, I realize know I should have been clearer. Please, bear with me as I try to explain the system that I have in mind:
  • My target audience is someone like me (and, I believe, many tech-minded people): interested in finding a piece software can do a certain job, but also wants to get their pick of the litter, rather than install the first "youtube mp3 downloader" they find on Google. I want to make it easier to find these tools.
  • I also expect that user to be able to do a little bit of the work themselves.
  • "extract video from mpeg without quality loss" is something I would never bother adding even if I had the time:
    1. It is not one use case. It is multiple. A tool that does that would be tagged: ("transcode video" = "convert video to video"), ("mux video" = "convert video to video losslessly"),("video to mpeg"), ("convert to mpeg"; which would also cover images to mpeg), ("avi to mpeg"), ("mp4 to mpeg"), ("flv to mpeg"), etc.
    2. As I expect the user to have a minimum of tech sense, and as they are searching for CLI and on our site, I would not plan on (nor care to prepare for) them to use a phrase like "extract video from mpeg", it's not about getting a part from a whole. I would expect something like "convert" at the very least.
    3. See the 2 main reasons for aliases in my previous post, but here, as the user searches for "convert video to mpeg without quality loss", the search engine should pick matches like "convert", "mpeg" (or "mpg"), "loss" and show the relevant tags at the top (my suggestion), and also the entry results with the most relevant tags.
    4. I want to cover the common terms/phrases, not every unique version. I just want to help the user described above find the proverbial thread.
    5. The format variations, like "avi to mpeg" would be in the dozens for all-in-one tools like ffmpeg, but I believe that the problem they may pose is not mainly the effort to add them, but the presentation; i.e. hiding them in lists, but showing them if they have a search hit, in the entry's page, under a "more" button, whatever it may be.
if something changes, some entries break and need to be verified/maintained.
Just curious, how do you see that happening? It is not a rhetorical question. I know things can break. I just want to know what you have in mind.
You can argue the index doesn't need to be complete to be useful. But the fact is, if it's only 20%~30% complete, it won't be very useful. And that 20% to 30% is already a _ton_ of work.
Other than it not being an index of tags as I visualize it, I guess I am aiming at 80-90 percent by breaking things into manageable units rather than doing all the variations of tags together. I don't know how our search engine works, but if it indexes keywords from the database or performs direct search, then weighs the results, it will be good enough, I think. I do believe it should do partial word matches though. :) That will make things easier (like with "losslessly").

What you have in mind could potentially work if the search domain is small, static and exhaustively maintained. That is why I brought up expert systems in my previous post. It turned out to have very limited application precisely because of that. Very few real world applications fall into this category. New information comes in all the time, and it became very costly and laborious for expert systems to remain updated and relevant.
I have to say I do believe that CLI tools fall comfortably enough within what you describe. The pool for Windows is comparatively small and relatively stable (if not fully static). See post #175 above for why I believe that. I wouldn't tackle such a project with even 10% of Softpedia's database for example.

"This idea will never scale."
I agree in part :) . See my previous point. But also, the categories and common use cases don't grow that much with time. The tools may. If we did the framework systematically and adopted a DRY approach (for example, see the last few paragraphs of my post #175 above), changing things en masse should be fairly easy or at least manageable in the vast majority of cases.
I think a more fruitful approach will be a better search engine.
I would actually like that. I mean, I am invested in this project and I wouldn't presume to tell you what to do with your effort on your site, but I wouldn't say no to both :mrgreen:.

Wall of text over!
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

User avatar
Andrew Lee
Posts: 2507
Joined: Sat Feb 04, 2006 9:19 am
Contact:

Re: CLI Database Discussions

#182 Post by Andrew Lee » Fri May 22, 2020 9:06 pm

About "alternativeto.net", I actually went a little further this morning, signed up for an account and poked around. If any of you have any insider information, I am all ears. But here are some of my thoughts:

- I don't think what you see is 100% crowd-sourced. There's definitely some kind of algo behind the scenes that takes user input and text/tag analysis to produce the clustering of similar software.

- Same with the tags. What you input is not immediately accepted, but goes into the backend for processing, probably with a combination of manual input and algorithmic processing. It's a blackbox as far as I can tell, not some transparent moderator approval process.

- Going by the activity in their forum, I'd be very surprised if crowd-sourced data forms the majority of their input. I am guessing data-scraping and text analysis play a bigger role.

- - - - - - - - - -

@vevy: I am very confused about 2 points from your arguments that keep coming up.

1. From your description of the use-cases that you would assign to a tool, it seems precisely the kind of micro-management that myself and others feel will never be feasible. Yet, you argue it's not, while continuing to cite examples that appear to contradict that view. Very confusing! Maybe if you could exhaustively list all the "hundreds" of use-cases that you would assign to just _one_ tool eg. ffmpeg, we could have a better basis for further discussion.

2. It seems that you think the database/tags for the CLI tools will be small and somewhat static. Have you considered that a dynamic database with fully crowd-sourced data and its complicated approval process like TPFC is not a good fit for the requirement? IMHO, a simple Microsoft Access -like database edited by 1 or 2 persons would be a better fitting tool.

- - - - - - - - - -
Just curious, how do you see that happening? It is not a rhetorical question. I know things can break. I just want to know what you have in mind.
A common one would be format/protocol changes/removal due to patent/security issues. Of course, you could argue that such changes don't occur very often, or it would be an easy update, or surely we don't have to put said format/protocol into the use-case etc. But that would be missing the point.

(Some examples that come to mind would be certain patented image formats like GIF, JPEG2000 etc. Crypto algorithms that used to be supported by SSH but since deprecated etc.)

I truly see a lot of similarities with expert systems (disclaimer: I used to be a research student back in the days). The solution to any problem would be "let's add more rules", or "let's change some rules", ad infinitum. No one ever steps back and ask, "Is this the right tool for the job?" :D
I would actually like that. I mean, I am invested in this project and I wouldn't presume to tell you what to do with your effort on your site, but I wouldn't say no to both
Now if only one of the tech giants would contact me and pour some funding into my bank account, I would be glad to assemble a team to tackle the research and implementation 8) Meanwhile, the cheaper alternative is to use the Google custom search engine!

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#183 Post by vevy » Sat Jun 06, 2020 12:23 pm

Sorry for the delay. In the course of preparing the answer, I went down a rabbit hole of learning about video and audio codecs and history, then a couple of life events sidelined me a bit. I am back at it though!
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

User avatar
webfork
Posts: 9523
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: CLI Database Discussions

#184 Post by webfork » Sun Jun 07, 2020 7:16 pm

vevy wrote:
Sat Jun 06, 2020 12:23 pm
Sorry for the delay. In the course of preparing the answer, I went down a rabbit hole of learning about video and audio codecs and history, then a couple of life events sidelined me a bit. I am back at it though!
Whatever you decide to do next, it's already affected me. You may have noticed I'm posting more on CLI tools, which is in part because I'm gradually picking up more and more from that area.

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#185 Post by vevy » Mon Jun 08, 2020 12:30 pm

Thanks for the encouragement.
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#186 Post by vevy » Thu Sep 03, 2020 2:54 am

For the Unix tools with multiple ports, what to do with the release date:

- Use that of the original Unix tool (even though the ports may be behind)? Also, it may feel pointless to chase after updates that have no effect on the Windows user.

- Create an entry for each port, as wasteful and confusing as that may be?

- Use the date of the latest port?

- Don't enter a date at all?


The same question can be asked about version, size, website, etc.
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

User avatar
Midas
Posts: 5526
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: CLI Database Discussions

#187 Post by Midas » Thu Sep 03, 2020 3:12 am

Use the main Windows release date. Add significant later dates if needed. :bulb:

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#188 Post by vevy » Thu Sep 03, 2020 3:23 am

Midas wrote:
Thu Sep 03, 2020 3:12 am
Use the main Windows release date.
There isn't one. There are multiple third-party efforts to bring it to Windows, but nothing from the original developer. :?
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

User avatar
Midas
Posts: 5526
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: CLI Database Discussions

#189 Post by Midas » Thu Sep 03, 2020 3:27 am

vevy wrote: There isn't one.

... Or whatever you can surmise in good faith stands for it. In our context, it's more of a lamp-mark for user orientation than anything else, IMHO.

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#190 Post by vevy » Thu Sep 03, 2020 8:33 am

We are, lamentably, being bereaved of you to abstruse logophilia once more, my dearest Midas!
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

User avatar
webfork
Posts: 9523
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: CLI Database Discussions

#191 Post by webfork » Thu Sep 03, 2020 8:07 pm

vevy wrote:
Thu Sep 03, 2020 2:54 am
Create an entry for each port, as wasteful and confusing as that may be?
This is one of the issues I imagined would come up (not this one specifically but you get the idea) when trying to generate a comprehensive database. My recommendation is reminiscent of the saying "ask forgiveness, not permission": pick one based on overall quality and wait for feedback. If there's a lot of noise, you can certainly add another entry.

As to how to judge overall quality, I usually gauge that through clear documentation and reasonable release frequency. It usually shows a more mature project and suggests an engaged developer.
Last edited by webfork on Thu Sep 03, 2020 8:10 pm, edited 1 time in total.
Reason: (better wording)

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#192 Post by vevy » Fri Sep 04, 2020 1:52 am

Hmm... Thanks for the advice.

I had decided to go with the website and release date of the original tool. I will probably do something similar to what you suggest with size and similarly-specific fields
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

User avatar
Midas
Posts: 5526
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: CLI Database Discussions

#193 Post by Midas » Fri Sep 04, 2020 3:37 am

I hope webfork's clarity will be more than enough to abscond me from my sins... :clown:

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#194 Post by vevy » Fri Sep 04, 2020 3:46 am

Thou art forgiven.
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

vevy
Posts: 678
Joined: Tue Sep 10, 2019 11:17 am

Re: CLI Database Discussions

#195 Post by vevy » Fri Sep 04, 2020 1:25 pm

100
"Is there a Windows-included tool for this task?"
"I only want open-source tools"
"I want a tool that is still actively developed"
"So many to choose from!"
and many more!
Support easy-to-do filters and badges!

Post Reply