What will you miss when this site closes?





204 votes ~ 6 comments

This site will turn read-only at the end of September 2017.

How do you fight image-spam?

Posted by Anonymous on Sat 14 Oct 2006 at 19:25

Over the past few months there has been a dramatic rise in a new type of spam mailings, which comprise of semi-random words and a real message embedded inside an image. How do you deal with this?

There is the gocr package available upon Debian Sarge, and other releases, which attempts to perform OCR, but this process is very fragile.

Although fragile and fairly resource-intensive OCR has made available as a plugin to complex anti-spam solutions such as SpamAssassin. The Fuzzy OCR plugin appears to be the dominant solution right now.

But for those of us not using SpamAssassin which solutions exist, and work?

How do you fight this problem?

Short of using image dimensions, or filtering all mail with an attachment is there a simple solution?

 

 


Re: How do you fight image-spam?
Posted by Steve (62.30.xx.xx) on Sat 14 Oct 2006 at 19:35
[ View Weblogs ]

I've been experimenting with gocr and ocrad for a day or two, with little success.

Both tools will only process "pnm" files rather than .GIF, or .JPG which is what I've been receiving. Converting many of the images I've got to that format just fails - however when the conversion succeeds the OCR generally does a good job.

I'm assuming that the conversion fails because of the multi-image images, or other perversities. Still I've not explored trying to automate this with procmail as I'm not sure how to go about extracting the attachments and working on them ..

Steve

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by Anonymous (84.160.xx.xx) on Sun 15 Oct 2006 at 01:24
What do you need this for? The methods to use highly depend on that answer.

I wouldn't do a full ocr on the image, as it is expensive. Do some mathematical tests on it which don't cost too much cpu time. If the recipient should be able to eventually read a text message that surely shows up somehow...

If you don't need a general purpose solution I think it's easier to build rules on who is allowed to send images to what extent and then do some simple tests on these.


cb

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by Anonymous (82.82.xx.xx) on Sun 15 Oct 2006 at 09:25
I always wonder why people like to hog their computers with ocr spam scanning, with 70_sare_stocks rules from Rulesemporium I get nearly most of them.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by marki (89.173.xx.xx) on Sun 15 Oct 2006 at 10:11
You can try ImageInfo from http://www.rulesemporium.com/plugins.htm (I wonder why there isn't any link to plugins on their homepage). It scores message based on number if images, their dimensions, image to text ratio, pixel coverage... Not perfect, but better than nothing.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by hardik (59.95.xx.xx) on Mon 16 Oct 2006 at 12:55
Wow, It's really worthful for me to fight IMAGE SPAM. You can see below log, drastic change from 5.3 to 11.3.
----------------------------------------------------------------- ------------
Mon, 16 Oct 2006 17:05:55 IST:19356: SA: REPORT hits = 11.3/4.0
  0.9 HTML_IMAGE_ONLY_24     BODY: HTML: images with 2000-2400 bytes of words
  0.0 HTML_MESSAGE           BODY: HTML included in message
  0.9 HTML_10_20             BODY: Message is 10% to 20% HTML
  0.0 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
  1.0 DC_IMG_HTML_RATIO      RAW: Low rawbody to pixel area ratio
  0.8 SARE_GIF_ATTACH        FULL: Email has a inline gif
  3.0 DC_GIF_UNO_LARGO       Message contains a single large inline gif
  1.7 SARE_GIF_STOX          Inline Gif with little HTML
  3.0 DC_IMAGE_SPAM_HTML     Possible Image-only spam
----------------------------------------------------------------- -------------
-- Hardik Dalwadi, National Innovation Foundation

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by simonw (84.45.xx.xx) on Sun 15 Oct 2006 at 10:40
[ View Weblogs ]
All previous antispam advice from myself on Debian Administration, with the exception of blocking Microsoft executable attachment types, is independent of the content of the emails themselves. As such these techniques work against image spam just as effectively as any against other types of spam (including whatever content trick the spammers try next), and importantly, the false positive rate is unaffected by the content of the email.

Given some of our customers are in porn, pharmaceuticals, or medicine, I think one can focus overly much on content.

Content does not make the email "bulk", or "unsolicited", which is what makes it spam. Although distributed checksum type ideas, are a form of content inspection that can identify "bulk".

As such content filters may satisfy people by stopping content that they don't want, that isn't necessarily the same thing as stopping bulk unsolicited emails, or spam.

For all but the smallest email servers, all content filters will likely have to be tuned to end users needs, or unacceptable rates of false positives are likely to occur. This is as likely to apply to images in emails as anything else.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by Anonymous (81.206.xx.xx) on Tue 17 Oct 2006 at 14:49
I agree, content based spamfilter wont do.
It leads to huge amounts of false positives and most importantly its a only temporary solution. Spammers will keep inventing new content tricks to bypass the spamfilter ruleset. A never ending battle.

Wouldnt it be a great idea to set up a RBL system based on the checksum of a message?

Suppose generic mailserver makes a crc of an incomming e-mail (could be of message body or sender/subject). It then consults a global server to see how often this checksum is present in its database. If its a new crc the global server would store it as a new crc, otherwise it would add +1 to the total count of this crc in the databse.
When the mailserver receives the crc-count from the global server it could reject the e-mail in question based on a local "reject_count" variable.

If enough mailservers join in, one could keep track of bulk messages send-out in the world.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by simonw (84.45.xx.xx) on Tue 17 Oct 2006 at 18:24
[ View Weblogs ]
Sounds like the Distributed checksum clearning house

http://www.rhyolite.com/anti-spam/dcc/

But since emails are rarely exactly identical, it can risk false positives, as you have to make some assumptions, or risk easy defeat, another difficult line to walk. Still it stops the simple ones I'm sure, and raises the barrier to spammers, so those that fancy the idea should go for it.

One must also be careful with email lists, and other solicited bulk email depending how you do such systems.

I'm sure more distributed antispam systems will evolve, since it is a natural way to spot bulk spamming. Indeed some RBLs already use the queries they get to identify new SMTP senders, and thus potential candidates for inclusion, which is a kind of dynamic antispam system.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by asdmin (212.51.xx.xx) on Sun 15 Oct 2006 at 17:48
Even it's not fully on topic, a good way of fighting against spam is the grey-listing method. It saves you from heavily interpreting images and messages,
it doest it's job on MTA level (working with e-mail envelope headers)

for overall information, good starting points:
http://en.wikipedia.org/wiki/Greylisting
http://www.greylisting.org/
http://greylisting.org/implementations/postfix.shtml (how can it be done with postfix)

--
Dániel Vásárhelyi

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by Steve (62.30.xx.xx) on Sun 15 Oct 2006 at 21:44
[ View Weblogs ]

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by bdf (134.184.xx.xx) on Mon 16 Oct 2006 at 08:41
For the last few months, I've been using complementary RBL checks to detect spam, mainly because spammers get better at circumventing the basic Bayesian filtering that is now commonplace. So far, this has had very good results - enough for me to think that RBL checks are an underrated solution. Our own Steve is on the record saying they are flawed by design (if he's indeed talking about RBLs there), but I wonder what his objections are (and if they can be fixed).

Although there is certainly room for abuse of blacklists, I think RBL checks have two strong advantages in the long run. They are not content-based and can therefore not be circumvented by using images and similar tricks, reducing the amount of catch-up you have to play with spammers. Additionally, the DNS lookup required for an RBL check is a very cheap operation. Currently, your server might still have the spare cycles to do an OCR scan of every mail image, but when you're doing substantially more work to detect spams than it takes to generate them, a denial-of-service attack becomes possible by simply increasing the amount of e-mail that's expensive to check.

To employ RBL checks with Postfix 2.x, look into the reject_rbl_client directive. This will reject the delivery of e-mails by blacklisted servers (it won't just delete e-mail - a proper sender will still receive a bounce if his message is not delivered). If you want more flexibility, you can tag messages using rblcheck and filter them based on your own procmail rules:
apt-get install rblcheck procmail
Unfortunately this last process could use more documentation and examples than what is available here.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by asdmin (194.237.xx.xx) on Tue 17 Oct 2006 at 07:55
RBL has too many false positives. Very often, fully legitimate SMTP servers gets into RBL.
Better idea to use RBL for "routing" messages into greylist (or other nasty non-blocking spam-trap).

asd

--
Dániel Vásárhelyi

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by bdf (134.184.xx.xx) on Tue 17 Oct 2006 at 10:10
YMMV. I haven't encountered such cases yet. This also depends heavily on which RBLs you include, what their policy is, etc. If false positives are critical to you, you may choose to only tag a message as spam if the sender appears in multiple RBLs or if other indications are present. And of course you can whitelist the legitimate servers you regularly trade mail with.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by asdmin (195.228.xx.xx) on Tue 17 Oct 2006 at 10:41
for this purpose (i mean whitelist), I'm using SPF to accept legitimate mail fast, reject mail on SPF attack (senders origin is not in spf origin list), and the rest of the mails (missing spf record) get's in a greylist (where 90% of it lasts)...

asd

--
Dániel Vásárhelyi

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by chris (213.187.xx.xx) on Tue 17 Oct 2006 at 18:07
[ View Weblogs ]
What software/config are you using for this configuration? I currently have all mail going thru greylist using exim4-heavy - would love to permit SPF thru without this.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by asdmin (195.228.xx.xx) on Wed 18 Oct 2006 at 07:44
I'm using postfix's UCE and policy delegation.

Related parts:

smtpd_recipient_restrictions =
reject_non_fqdn_recipient
reject_unknown_recipient_domain
permit_mynetworks
reject_unauth_destination
check_recipient_access hash:/etc/postfix/maps/recipient_access
check_sender_access hash:/etc/postfix/maps/sender_access
check_client_access hash:/etc/postfix/maps/client_access
check_policy_service inet:127.0.0.1:10000
check_policy_service inet:127.0.0.1:2525

the last two lines are the "important factor":
one before last does the spf checking (as described in http://spf.pobox.com/, package name: whitelister)
the last does the greylisting (as described at http://www.greylisting.org, package name: postfix-gld)

The possible scenarios:
- the "first" rule accepts the mail (reply: OK): the "second" rule isn't called at all, mail routes in
- the first rule answers (reply: DUNNO): the "second rule is activated, therefore the mail gets in the greylist, and after a predefined number of seconds, the Nth try will be accepted)
- the first rule rejects the mail (550): in this case the SPF showed that the sender is not permitted to send the mail from the actual host, mail rejected, second rule isn't called at all.

check_policy_service is described at:
http://www.postfix.org/SMTPD_POLICY_README.html

I recommend everyone to use SPF, it can really help ppl fighting against spam. Of course, the effectiveness of SPF heavily depends on how many sysadmin integrates that single TXT record in their domain, but if they do (like the biggest free mail providers gmail, yahoo and even hotmail) the spam senders' ability to fake e-mail addresses significantly shrinks.

It's only one TXT record in your domain, and ppl who are using SPF will not permit letters only from your smtp server....

asd
--
Dániel Vásárhelyi

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by chris (217.8.xx.xx) on Wed 18 Oct 2006 at 07:54
[ View Weblogs ]
You can even get help here : http://www.openspf.org/ - they have a form that will help you generate the correct TXT record.

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by simonw (84.45.xx.xx) on Sun 22 Oct 2006 at 22:28
[ View Weblogs ]
I thought the majority of the email with a valid SPF record was spam.

Have the spammer abandoned SPF as well now?

[ Parent | Reply to this comment ]

Re: How do you fight image-spam?
Posted by Anonymous (213.164.xx.xx) on Mon 23 Oct 2006 at 11:12
> I thought the majority of the email with a valid SPF record was spam.

That may be the case, but if a domain is hijacked for forged spam sending, adding an SPF record helps people who check SPF block the spam very quickly.

The next stage of SPF is a trust metric. Do you know if that's setup?

[ Parent | Reply to this comment ]

postfix body_checks
Posted by Darxus (209.150.xx.xx) on Fri 1 Dec 2006 at 04:19
/\bsrc\s*=(?:3D)?\s*["']?cid:/ REJECT Your email was rejected because you embedded an attached image in the body.

Catches 100% of them before the initial SMTP transaction finishes. And if there are any false positives the sender gets an email saying "Your email was rejected because you embedded an attached image in the body." or whatever you set it to. And if it was a spammer no bounce goes to the forged From: address, because the sending MTA delivers the bounce message. I'm extremely happy with it.

http://www.postfix.org/header_checks.5.html

[ Parent | Reply to this comment ]

Re: postfix body_checks
Posted by Anonymous (80.75.xx.xx) on Fri 22 Dec 2006 at 13:39
ehh, so what you are saying is, reject any mail that has an inline image?
Sounds pretty hard to me...

[ Parent | Reply to this comment ]

Re: postfix body_checks
Posted by Darxus (209.150.xx.xx) on Fri 22 Dec 2006 at 14:14
No, I said "embedded ... attached image". So an inline image that is not attached but hosted on a webserver, or an attached image that is not inlined (not img src'd in the email) would go through just fine (and leave plenty of information for spam filtering). Only inline attached images would get blocked, and it's not hard, they all have to match that regex. URLs for attachments start with "cid:", and img tags aren't that complicated - so this is just a simple but thorough regex to match things similar to 'src="cid:'.

[ Parent | Reply to this comment ]

FuzzyOcr hit Debian unstable yesterday !
Posted by liotier (130.133.xx.xx) on Tue 19 Dec 2006 at 14:07
FuzzyOcr hit Debian unstable yesterday ! Mail server administrators rejoice ! Somebody must have been even more pissed off than me about image spam and decided to make the Debian packaging work… Installing FuzzyOcr on a Debian server is now trivially easy. I have installed FuzzyOcr with great success and reported about my experience.

[ Parent | Reply to this comment ]