Gnutella Forums

Gnutella Forums (https://www.gnutellaforums.com/)
-   LimeWire Beta Archives (https://www.gnutellaforums.com/limewire-beta-archives/)
-   -   Junk Filter (https://www.gnutellaforums.com/limewire-beta-archives/49917-junk-filter.html)

Grandpa December 16th, 2005 07:48 PM

Junk Filter
 
I marked some files as junk which were known to me to be junk files viruses & spam. The strange thing is after I did this when I did searches for other files I had files showing up as junk files that were in no way associated with the files I had selected as junk files.

The files in question were known to me to be good files. I did a bitzi on them and they were known good files in bitzi. They were in no way associated with the files I had selected as junk not in name or size.

So why is the junk file filter picking them as junk files ?.

Also can change from junk to not junk and the next search it will show up as junk again.

4.9.39

Lord of the Rings December 16th, 2005 08:29 PM

I haven't used it as yet, but you can undo your junk settings in Tools>Options. The junk settings seemed to work for me, but I haven't used it extensively (wind on a mac).

ultracross December 16th, 2005 08:32 PM

select the files that are not junk and click "Not Junk". The filter has to be trained to you own preferences.

Grandpa December 16th, 2005 08:42 PM

I have done that

Quote:

Also can change from junk to not junk and the next search it will show up as junk again.

ultracross December 16th, 2005 08:59 PM

hmm, could be a bug

Grandpa December 16th, 2005 09:19 PM

Maybe, maybe not it just might take it a while to remember what you want a couple of them quit showing up as junk after the 4th time telling it, it was not junk. But others are still showing up as junk. The ones that are still showing up as junk are pretty close in size as the ones I named as junk. What format are they using for the filter name or file size.

Grandpa December 16th, 2005 09:27 PM

Whoops spoke too soon it is back to displaying them as junk. Also the options of show junk at bottom and don't show junk do not appear to work.

From general deduction I am guessing it is filtering by IP address.

ultracross December 17th, 2005 05:35 PM

This is probably a bug related to one I discussed with LW devs previously. I will notify a developer about it.

Thanks for a confirmation on it. :)

Lord of the Rings December 17th, 2005 05:50 PM

1 Attachment(s)
Grandpa have you adjusted the Junk Threshold? I notice LW 4.9.39 now lists this option under Tools>Options>Searching>Junk as compared to 4.9.37 which listed this slider under the search options on the main interface.

Though I must also confirm Grandpa's findings. When trying to filter out virii, & having set it Not to show Junk, it still showed up (but with the trash can beside them.)

stief December 17th, 2005 08:18 PM

I've had to turn off the junk filter in order to get any "What's New" results.

With my current set-up and habits, the filter was more of a hassle, but quite interesting to test, and working around the spam was already habitual :p

Other notes about the 4.9.39 Beta:

--About box needs updating (i.e, Dave Nicponski; Roger Kapsi)

-- command-clicking (OS X) on one of several selected entries should deselect the entry. Instead, it deselects everything else and selects the entry (this is old, like the false drag and drop problem in the downloads panel).

--search results are sorting by case in the name column

--when running as UP and with the filter on, heavy CPU activity somehow was triggering a loss of leaves.

--would be nice if the new look for the search tabs applied also to the pane tabs, status bar icons, scroll buttons, and the like.

Grandpa December 17th, 2005 11:34 PM

Yes I played with the slide bar but my findings was it didn't make much difference. What I did when training was I did software searches that had large amounts of the 851.7 virus then I sorted by file size I then chose the whole group of 851.7 files which was apparently a mistake.

I still haven't figured out a way to make it work without it filtering out a large amount of good files. I will keep trying but i have a feeling that this filter is no better than any I have seen in the past and eventually I will give up trying to make it work and totally disable it.

Lord of the Rings December 18th, 2005 04:11 AM

For me it worked well with 4.9.37 re: auto-spam. But I installed 4.9.39 before I did any more. I then concentrated upon the virii sizes as you were. I found it wasn't as successful. I don't know whether one can conclude that it works well with auto-spam such as mp3 & ipod spam, but not so well with virii.

* Some notes about how to use the spam filter to its greatest efficiency would be beneficial.

kmag December 19th, 2005 08:04 PM

Thanks for the feedback.

In 4.9.40, Roger fixed some display confusion to make the threshold for showing a trash can in the "quality" column equal to the threshold for hiding junk (if the hiding junk option is enabled).

These problems with too many files being marked junk over time are likely a problem with the filter being set up to learn more from bad hints than from good hints. (In the code, these hints are called "tokens".) Hopefully LW 4.9.40 is much better about this; the learning should be much less biased toward the bad unless you set the sensitivity above 50%.

The spam filter is actually a set of filters, where the file starts out being 100% good, and each filter multiplies the goodness by some value between 1.0 (inclusive) and 0.0 (exclusive). It's probably a host of different filters that are whittling the files down to a "junk" rating.

Basically, if you search for some terms and end up getting a result that you mark spam, LW will internally create a bunch of tokens for different things LW knows about the file. There's a token for the size of the file, a token for each word in the title, etc, etc. Tokens that keep showing up in the search results for "very bad" spam rated files gradually get marked more and more "bad". Tokens that keep showing up in the search results for "very good" spam rated files gradually get marked more and more "good". Part of the problem is probaby that the standard for "very good" was more tough than the standard for "very bad", (hard-coded to below 15% junk vs. above 70% junk) so with each search, the effects of the "bad" tokens relative to the "good" tokens was multiplied. Basically, lots of very "bad" tokens mean lots of search results get very bad spam ratings, which means lots of tokens slowly get marked more "bad"... it's a snowball effect, and we need an opposing "good" snowball effect to cancel it out. This is an over-simplification, but hopefully it helps you get a general idea of what goes on inside the spam filter.

Give 4.9.40 a try and let us know how it works for you.

Don't be shy about going into the options and changing the sensitivity of the junk filter. In 4.9.40 (unlike 4.9.39 and 38), to some extent the sensitivity of the junk filter affects the balance of influence between bad junk ratings and good junk ratings. Below a sensitivity of 50%, it's hard to say which way the learnig is biased. Above 50% sensitivity, the learnig becomes more and more biased toward increasing the "bad-ness" of tokens. Hopefully with feedback from real-world useage, we can tweak the filter to have very little bias in the sensitivity ranges that people actually use.

Grandpa December 19th, 2005 10:28 PM

Thanks for the explanation.

Now I am going to have to eat my words the filter in .40 seems to work extremely well so far. I have never liked filters none of them ever seemed to work but so far this one does. It is very easy to train and very effective. I do like the fact that you can view the junk results so if it makes a mistake you can correct it.

Time will tell if it is going to be good or not but so far it is very good you guys once again have proved to me that you are among the best.

Now that that is out of the way how about working on direct connect. You know in the future that may be all that we have.

Any way keep it up you are the King of the hill and everybody is going to try and push you off. But you keep making improvements like you have the last few years, then they are going to have a hell of a time doing it.;)

Lord of the Rings December 19th, 2005 11:49 PM

One thing I've noticed about the win 4.9.40 is the stop search button sometimes seems to become faded out so I can only stop a search by right-clicking the search tab. This has been happening in spurts/groups of searches. Actually I noted this seems to happen after I right-click the tab to repeat the search. Or is this normal. I don't always want to stop a search.

As far as the filter goes, I find that the result still seems to display despite being set not to (I'm concentrating on virus sizes for program files; ie: 765/851 KB.) One moment they'll appear as normal, a few secs later with a trash can as it recognises them as junk. At present I have settings at around 85-90% for the slider. I've increased it bit by bit over this session.

Sometimes a new one shows up in the results window, I click junk & the others with a trash icon disappear as well as that one. I'm meaning only a small % of those detected as spam showed up in the results window. Is this normal. And why do they suddenly disappear only sometimes when another one is designated as junk?

Lord of the Rings December 21st, 2005 07:49 PM

It's greyed out until you do a search. Then select any item in the search results that you feel needs to be filtered out. Then the trash button should be accessible. Or did you already try that?

Under LW prefs>Filters>Junk you'll find options for how you want to use it.

I've been using the windows version so I'm not sure about the mac version.

Morb December 27th, 2005 01:43 PM

I had set the filter to strict all the way and not to show junk files. While selecting 4 files and marking them as junk in a total list of 8 I suddenly have only 1 file remaining. Sometimes all the results in the search window dissapear. Seems odd to me. Mostly the smaller files I will mark as junk because they're only spam. Some huge files though are too so I mark them as well. Now I go on a completely different search for another file, totally different name and find a ton of files that aren't junk are marked as junk! Why? I set the filter to show the junk files so that I can now see what's marked as junk that I DIDN'T mark as junk to unmark it. Ugh. I've sinse just turned the whole damn thing off as it seems to be more of a hassle. I still say just put a file size filter on. I don't see how you're supposed to train something when you don't know how it works or is designed in the first place. :rolleyes: Then when you sort it, the junk files won't sort and there's valid files above and below. Just not worth it.

c_robertson December 27th, 2005 03:59 PM

Junk filter on file size
 
There have been quite a bit discussion about filtering out file size, and why the filter come out the way it did, I don't understand.
I (and most others) would be so very happy if we could mark a file with ??k file size as junk, and all of those file sizes are filtered out. It seems so simple. Apparently there is more to it then that.

I have to conclude that this filter was also made for filtering out similar file names. That seems risky, since there are good files with the same file name as bad ones.

If you want to go with that, make a third pane which may be expanded with the junk files. Of course, you will then find many good files are being filtered out and you will never use the junk filter again.

That is what will happen, and what I will be doing.

Grandpa December 27th, 2005 06:33 PM

Well if you train it, It actually works pretty well move the slide bar back to around 30% to 40% mark the files you know to be good as not junk. It will then start distinguishing between name and size. I myself have never liked filters of any kind but if you take the time this one works better than any I have used before.

If you don't want to take the time to try and figure it out then don't use it. Just keep crying about the filter and maybe they will change it. But I hope not.

mfenech December 29th, 2005 09:29 AM

Why do I still see junk files in my search results (items flagged with the trashcan icon) if I select 'Do not display junk' in the Filter Options? Am I missing something? :confused:

c_robertson December 29th, 2005 09:14 PM

Filtering files size
 
No your not missing anything. If you move the slider to strict, you won't see much except those files it "Thinks" might be trash, allowing you the option to mark them as not junk.

I have found you don't want to hide them because so many files that aren't junk will be hidden as well.

Crying? Really dumb thing to say.
As for working well, no it doesn't. It works to an extent, but as pointed out, many files that aren't junk get marked junk and you have to mark them as not junk, and the learning never ends. It really has to learn with every new filename.

It is a good attempt with and a good idea, but it doesn't really practically work.

Grandpa December 29th, 2005 09:20 PM

Working fine for me it took about 2 days to do most of the training. It still occasionally gets one I have to mark as not junk but the application works quite well for me.

hopalong December 30th, 2005 12:02 PM

Grandpa !

I'm trying for days only with the audio 134.4 kb spam, using the don't display junk option, first testing with strict , second with about 40% sensitivity - there is no difference. Anything I search, always get the spam, sooner or later - maybe sometimes I connected to "good' UPs.
After marking as junk and researching, I usually get the spam displayed with another search string variation as normal, the marked string displayed as junk 100%.
Marking again the normal ones as junk and researching, I get again the spams displayed, but already with the junk trash icon and 100% displayed in every line. If I try other strings to search too, they all are displayed as junk 100.
The filter works approximately as follows: in 80 % of the cases it doesn't work, every results are displayed, in 15 % it works partly and in 5% fully. I didn't get false junk results.
Maybe You are trying other cases, but I get only this spam.
There is one good outcome: the spams are better visible because of the trash icon and the 100% junk displayed, not only the 134.4 kb size.
I think the developer(s) should work on the filter yet.

Lord of the Rings December 30th, 2005 01:18 PM

It only took me 2 searches to filter out the typical auto-spam such as mp3 spam & the other type of M_Y_S_E_A_R_C_H type of spam with the ipod adverts. The virii are harder to filter out ... needs a lot of persistence.

You need some references to work with. Here's some: 1. autogenerated spam results, & also
2. autogenerated mp3 spam results

3. WARNING: Viruses on network you should be aware of! (click on link)

* ie: the spam filter will work with both the file size & the filename. As previously said, you need to train it bit by bit. Some spam is easier to filter out than others! ;)

hopalong December 30th, 2005 02:21 PM

I've read all of your references, what is wiritten about this 134.4 kb mp3 spam. I must tell that I never downloaded such a file, I have no virus, spyware, adware. This is verified online and regularly with McAfee virusscan, Zonealarm firewall, Spybot search, Spyware Blaster, Ad-Aware 1.5 Personal.
So remain the training of junk filter - I think I've written already how I've done it. Again: I search for Blabla1 blabla2 string, and mark as junk all spam result that is not displayed as junk already, and I repeat this...But this is a never ending cycle: I always get spam displayed - as junk already, and not omitted from the results.
If you can, please tell me concretly, how can I really filter, what would you do in this case.

Thanks your help.

Lord of the Rings December 30th, 2005 02:33 PM

Perhaps you should set the spam filter for high detection in option settings. Perhaps ensure you have it set to not show spam results.

I cannot explain "why" some people find the junk filter easier to use compared to others. I suspect it has something to do with overall experience with using LW. Just a guess. Experienced users tend to know exactly what to filter out without burning out their thinking cells.

hopalong December 30th, 2005 10:12 PM

Yes, I set the options to "Don't Display Junk" and "Strict", but I tried with the sensitivity set about 40 % too.

I only ask the experienced and helpful people here - I only use LW pro about one and half year - to help how to use the junk filter (in an effective way).

hopalong December 31st, 2005 04:13 AM

1 Attachment(s)
That's a result after training with "Don't Display JunK" and Strict option. As can be seen, there are cases, where there is
no filtering in effect, other searches are partially filtered.
I've tried to mark the junk lines as not junk then junk again - LW removed the lines immediately, but after researching I got again the previous result, maybe with some variaton.
When there is valid result for a meaningful string, the result is the same, but I get valid result lines too.

FREELimeWirePRO January 5th, 2006 11:29 PM

Download Limewire
 
No Need To Pay Anything! 100% FREE!

http://www.limewire.com/english/content/download.shtml
:cool: :cool: :cool:

CROSSPOSTING
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Edited to comply with the House Rules.
Multiple copies of the same post will not be tolerated.
Forum Rules (click here)

SubJunk January 7th, 2006 01:02 AM

Hmm
 
The problem with the junk filter isn't a problem with the junk filter, it's a problem with the viruses. The viruses automatically rename themselves to whatever you search for.
For example a "south park christmas" search will turn up a virus called "south park christmas", and if you mark that as junk then do a search for basically anything else, like "family guy buttsex" a file by that name will turn up and may or may not be marked as junk also, depending on whether the virus has modified a byte randomly or not, causing it to be the exact same file with the same function but having one more 0 or 1 and one less 0 or 1.
Perhaps someone can explain this more thoroughly as I'm tired...

Lord of the Rings January 7th, 2006 04:01 AM

The junk filter is a learning filter. As you add more files of the same nature/characteristics, (eg: files of 765 KB in size) then the filter gradually learns to filter those out. Certainly the common harmless spam like the autogenerated spam results, & also autogenerated mp3 spam results are very easy to filter out. It only took me 2 searches to totally eradicate those from search results by searching for terms that would deliberately promote those type of results. However virii files are much more difficult to filter out. It takes much much longer. Other virii sizes to filter out are 399 KB, 851 KB, 60.5 KB. You may want to check your filtered results however to make sure you don't filter out the wrong items by mistake. I guess it also depends upon the intensity of the filter you set it to under Tools>Options>Filters>Junk

exp January 7th, 2006 08:36 AM

all this info is interesting but one question...
why the hell are people storing virus files on their pc's
takes some sick people that want to infect everyone.
as for filtering, i filtered out the file size 851.7KB but it still comes through when searching! and i tried different variations on the filter 851.75KB 851.7 KB etc but it still appears! why is this?

Sphinx January 7th, 2006 09:33 AM

Normal users do not store viruses on their computers those are generated by the fake hosts beware of them. and filters cant block by file sizes. ;)

Grandpa January 7th, 2006 10:44 AM

Allot of the people that are sharing the virus do not know they even have it. They are just naive to what they are doing. Just look through this forum people come here all of the time saying LimeWire wont shut down whats wrong.

Or they come here saying why are there so many files 851.7kb in size. I downloaded it and opened it but it didn't work. These people do not even realize that once they have opened it they have infected themselves and are sharing literally 100s of these files on the net since this Virus replicates itself.

Generally it probably on takes a few days for these people to realize they have a virus but in that few days they can have virtually thousands of copies of this file. Now I am not sure but I think this file has the ability to read the names of files and rename itself when it recreates itself.

Well I think you can figure out the rest of it and but just in case you cant if a person downloads 1 = 1000s of files shared that is basically the reason there are so many of them.

And by the way if I was a person that wanted to destroy a P2P net I think that this might be a good way to start down that path. Hell I have to do is release 1 file into the wild and watch it grow.

exp January 7th, 2006 11:05 AM

Re: Hmm
 
ok now i'm confused!!!!
Sphinx you said (Quote) filters cant block by file sizes.

but below Lord of the Rings says
(Quote) Other virii sizes to filter out are 399 KB, 851 KB, 60.5 KB.

so what does the filter block 'words' 'names'? or can it block file sizes?
if it's just names then how can you block something for eg. song title! if that's what your looking for??

Grandpa January 7th, 2006 11:17 AM

From what I understand the new smart filter uses name and file size association to learn thus the reason for training over time it will associate a file of a certain size as a virus. It will also determine that a certain name is a virus. But if you mark a file that is marked as junk due to name association but not size it will learn over time that a file by that name is not necessarily junk but a file of a certain size and name is.

hopalong January 13th, 2006 07:00 AM

Junk Filter again
 
1 Attachment(s)
I'm sorry, but I still can't filter out the mp3 spam (134.4 kb ), ie after marking them, at research they displayed with trash can and junk 100 %, despite the "Don't Display Junk" and Strict sensitivity option settings.
I attached a console debug output about ..spam.* in hope that somebody can help me.

mfenech January 13th, 2006 10:47 AM

Same problem here as hopalong's. I often look for obscure music that will only come up once in a while. I get the flagged (trashcan) results constantly. There have been several updates since this filter's 'do not display junk' option was added, and it still won't hide the junk files. Doesn't make any sense.

exp January 14th, 2006 12:59 PM

i'm having the same problem as above, i know that the whole filter learning proccess takes time, but just a pain...as it puts other files as junk (which is not), so you spend alot of time unmarking as junk...so if the filter did work i.e. not show junk...then alot of files we are searching for would not appear! as it would block them...does this make sense?

Grandpa January 14th, 2006 01:43 PM

Yes it makes sense that is why there is the show junk in place and show junk at bottom option. Because until you get it trained you need to watch it.

hopalong January 14th, 2006 02:07 PM

The Display/Don't Display Junk option setting controls the displaying. The problem is that the trash can is always displayed.
In previous betas the junk option handling was in the Searching option, but I think the current place in Filters is much better, and for me suggests that the filtering happens not in the search time but at displaying. It would be nice to place the Display/Don't Display Junk option to the search result too and one could change it on/off even after the end of search.
Slow reply exp, sorry.

Grandpa January 14th, 2006 02:44 PM

Agreed

hopalong January 19th, 2006 04:58 AM

LW 4.10.4 Spam filler works !
 
As I see, there is a plus redisplay phase when the result has turned to be junk, so the Don't Display and Dispaly at the Bottom settings now work good here with the mp3 spam.

hopalong January 19th, 2006 11:22 AM

one problem with spam
 
It's happen sometimes, that in one line the spam occurs 1, 2 or 3 times only, and the junk percentage is 0, 50 or 66% accordingly, so this line quality will contain stars and not trash can and that's why this line is not filtered out. In nornal case when the junk percent is 100%, this line filtered out when LW 4.10.4 redisplays.
It's no use to mark this as junk.
My main problem is why this line is not marked promptly as junk (and filtered out), when in this search there is already some line marked as junk with the same result name and size.
Sometimes the name is sadly a variaton, for example asd_qwe instead of asd qwe.

hopalong February 14th, 2006 03:33 AM

LW 4.10.8 junk problem
 
1 Attachment(s)
Sometimes one/two instances remain as not junk, two instances with 50 % junk value. I think that already the 1st instance should be junk, because this result (sha1) occurred already as junk.
Another problem that the number of instances (3/51) changed to (2/3) after marking as junk, and not to (2/51).
The new junk marking doesn't help.

Lord of the Rings August 5th, 2006 04:38 AM

Junk Filter
 
Some people have been thinking the junk filter filters out material that shouldn't be filtered. Well, the junk filter won't filter out what you haven't trained it to do. In fact, if you've never used the Junk filter, then nothing will be filtered out at all. I have just confirmed this (with settings on Strict/maximum.) If there's something wrong with your junk filtering, then tell the filter that the particular file is not junk, keep doing that & it will stop filtering it out. ie: select the file & press the not junk button. Else, tell the filter to forget all training. This can be set in the Junk Filter options under Tools>Options>Filters>Junk.

I thought I should add this to the archive since some people have been getting confused. Thus be careful how you use the Junk filter.


All times are GMT -7. The time now is 05:16 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.

Copyright © 2020 Gnutella Forums.
All Rights Reserved.