Do you use let's encrypt?





7004 votes ~ 22 comments

 

XML Logo

Posted by fugit on Thu 16 Apr 2015 at 16:56
Tags: none.

The Problem: Vendor software we were looking to evaluate did not compile correctly using the lastest version of wheezy(7.8 at the time of writing).

The Solution: We had a to build a new server using snapshot.debian.org. This allowed us to build the server at a point in time after 7.1 was released but prior to the 7.2 release. Below are the settings we use.

    cat /etc/apt/sources.list
    deb     http://snapshot.debian.org/archive/debian/20130802T034916Z/ wheezy main
    deb     http://snapshot.debian.org/archive/debian-security/20130802T202345Z/  wheezy/updates main

For more details head on over to the site.

 

Posted by fugit on Thu 26 Dec 2013 at 18:15
The Problem: Using the same configuration that worked under squeeze for Bonding and Vlan with Openvz, on wheezy it is failing. The symptoms are that only the vlan with the default gateway set are working. I can move the default gateway to any vlan and it will work. The vlans work on when communicating to machines on the same vlan. Using tcpdump/wireshark I confirmed that traffic is coming in but never making it out the default GW unless it is the vlan with the default gateway. On the squeeze servers you can see the traffic going out the default GW.

The Solution:
Turns out you need to set net.ipv4.conf.default.rp_filter = 2 (or 0 for no spoof protection). Strict filter results in vlans not on the default gw to be broken. More details and links will be posted later. Unfortunetly I didn't find the links with the solutions till I had found the issue was net.ipv4.conf.default.rp_filter. I originally missed this in testing because you need to restart(networking) after making the changes. I am not sure how I missed this when rebuilding a new clean server with wheezy. When built from scratch with defaults rp_filter = 0. Like most problems it seems pretty obvious once you have the solution. The text in the sysctl.conf file says "Uncomment the next two lines to enable Spoof protection (reverse-path filter)." This pretty clearly was the issue. Sadly I tested twice to make sure change I had made were not causing the problem but the first time failed because I had not restarted the network or the server after reverting the changes to rp_filter. The second time I have no idea how I missed it on a clean build of a new server. After building the server and only changing the network config it presented the same symptoms, obviously I made a change or missed something. Hopefully this post will save someone else some time.

Cisco Setup:
Cisco Hardware
We are using a cisco Nexus 7000 switchs with gigabit ethernet module that supports 802.3ad. For more information regarding the different bonding options you can check out this link

Setup the port channel
                                                                                                                                                                                                            
interface port-channel170                                                                                                                                                                                        
  description servername01                                                                                                                                                                                                                  
  switchport mode trunk                                                                                                                                                                                                                     
  switchport trunk allowed vlan 45,48-49                                                                                                                                                                                                    
  vpc 170                                                                                                                                                                                                                                   
Configure the physical interfaces on the cisco switch:
                                                                                                                                                                                                                                       
interface Ethernet1/11                                                                                                                                                                                                                      
  description servername#1                                                                                                                                                                                                                  
  switchport mode trunk                                                                                                                                                                                                                     
  switchport trunk allowed vlan 45,48-49                                                                                                                                                                                                    
  spanning-tree port type edge                                                                                                                                                                                                              
  channel-group 170 mode active                                                                                                                                                                                                             
  no shutdown                                                                                                                                                                                                                               
                                                                                                                                                                                                                                            
interface Ethernet3/11
  description servername#2
  switchport mode trunk
  switchport trunk allowed vlan 45,48-49
  spanning-tree port type edge
  channel-group 170 mode active
  no shutdown
...
Make sure the the "switchport trunk allowed vlan" has the vlans you are going to be doing on the linux server. Until these matched nothing worked for me.

Server HardWare: The current server we are using is a DL360pG8 which has a broadcom tg3 4 port card. This card has had several reported issues to rule this out I later installed a base wheezy package on an older server that was known to work with our confugration under squeeze and our current Nexus 7000 switch. This produced the same issues reported here. I had also tried using the backport kernel to further rule out drivers, this was before building a new server.
lspci | grep -i broad
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
...
Linux Network Config:
Install the required pacakges and load bonding module
apt-get install vlan ifenslave
Interfaces Config: /etc/network/interfaces
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eth0

auto bond0
iface bond0 inet manual
        #bond-mode 802.3ad
        bond-mode 4
        bond-miimon 100
        bond_downdelay 200
        bond_updelay 200
        bond_xmit_hash_policy layer2+3
        bond_lacp_rate slow
        slaves eth0 eth1 eth2 eth3

auto vlan45
iface vlan45 inet static
        vlan_raw_device bond0  
        address 10.200.45.155  
        netmask 255.255.255.0  
        network 10.200.45.0
        broadcast 10.200.45.255

auto vlan48
iface vlan48 inet static
        vlan_raw_device bond0  
        address 10.200.48.121  
        netmask 255.255.255.0  
        network 10.200.48.0
        broadcast 10.200.48.255
        gateway 10.200.48.1

auto vlan49
iface vlan49 inet static
        vlan_raw_device bond0  
        address 10.200.49.155  
        netmask 255.255.255.0  
        network 10.200.49.0
        broadcast 10.200.49.255
I had also ready posts regarding people having problems using the "pretty" or easy to read version above so I also tried the below configuration with the same results.
# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#allow-hotplug eth0

auto bond0
iface bond0 inet manual
        #bond-mode 802.3ad
        bond-mode 4
        bond-miimon 100
        bond_xmit_hash_policy layer2+3
        bond_lacp_rate slow
        slaves eth0 eth1 eth2 eth3

auto bond0.45
iface bond0.45 inet static
        address 10.200.45.155
        netmask 255.255.255.0

auto bond0.48
iface bond0.48 inet static
        address 10.200.48.121
        netmask 255.255.255.0   
        gateway 10.200.48.1

auto bond0.49
iface bond0.49 inet static
        address 10.200.49.155
        netmask 255.255.255.0
Trouble Shooting:
On Linux
ServerName# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100  
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 4
        Actor Key: 17
        Partner Key: 32938
        Partner Mac Address: 00:23:04:ee:be:0a

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:24
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:25
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:26
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: d8:9d:67:2c:aa:27
Aggregator ID: 1
Slave queue ID: 0
filename:       /lib/modules/3.2.0-4-amd64/kernel/drivers/net/bonding/bonding.ko
alias:          rtnl-link-bond  
author:         Thomas Davis, tadavis@lbl.gov and many others
description:    Ethernet Channel Bonding Driver, v3.7.1
version:        3.7.1
license:        GPL
srcversion:     0384DF6574E0ED31BA573D8
depends:
intree:         Y
vermagic:       3.2.0-4-amd64 SMP mod_unload modversions
parm:           max_bonds:Max number of bonded devices (int)
parm:           tx_queues:Max number of transmit queues (default = 16) (int)
parm:           num_grat_arp:Number of peer notifications to send on failover event (alias of num_unsol_na) (int)
parm:           num_unsol_na:Number of peer notifications to send on failover event (alias of num_grat_arp) (int)
parm:           miimon:Link check interval in milliseconds (int)
parm:           updelay:Delay before considering link up, in milliseconds (int)
parm:           downdelay:Delay before considering link down, in milliseconds (int)
parm:           use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int)
parm:           mode:Mode of operation; 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp)
parm:           primary:Primary network device to use (charp)
parm:           primary_reselect:Reselect primary slave once it comes up; 0 for always (default), 1 for only if speed of primary is better, 2 for only on active slave failure (charp)
parm:           lacp_rate:LACPDU tx rate to request from 802.3ad partner; 0 for slow, 1 for fast (charp)
parm:           ad_select:803.ad aggregation selection logic; 0 for stable (default), 1 for bandwidth, 2 for count (charp)
parm:           min_links:Minimum number of available links before turning on carrier (int)
parm:           xmit_hash_policy:balance-xor and 802.3ad hashing method; 0 for layer 2 (default), 1 for layer 3+4, 2 for layer 2+3 (charp)
parm:           arp_interval:arp interval in milliseconds (int)
parm:           arp_ip_target:arp targets in n.n.n.n form (array of charp)
parm:           arp_validate:validate src/dst of ARP probes; 0 for none (default), 1 for active, 2 for backup, 3 for all (charp)
parm:           fail_over_mac:For active-backup, do not set all slaves to the same MAC; 0 for none (default), 1 for active, 2 for follow (charp)
parm:           all_slaves_active:Keep all frames received on an interfaceby setting active flag for all slaves; 0 for never (default), 1 for always. (int)
parm:           resend_igmp:Number of IGMP membership reports to send on link failure (int)
I also used tcpdump to determine where the connections were getting lost. I looked at them using wireshark. tcpdump -i any -U not port 22 -w /tmp/tcpdump_any_20131220.dump This showed that traffic was coming in no problem and everything was working except when connecting to vlan's that did not have a default gw and you were not on that vlan. This makes it look like a routing issue within the OS. If anyone would find the dump lines interesting let me know and I can dig them up and post them.
Routing table.
route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.200.48.1     0.0.0.0         UG    0      0        0 vlan48
10.200.45.0     0.0.0.0         255.255.255.0   U     0      0        0 vlan45
10.200.48.0     0.0.0.0         255.255.255.0   U     0      0        0 vlan48
10.200.49.0     0.0.0.0         255.255.255.0   U     0      0        0 vlan49
 
ip route list
default via 10.200.48.1 dev vlan48
10.200.45.0/24 dev vlan45  proto kernel  scope link  src 10.200.45.155
10.200.48.0/24 dev vlan48  proto kernel  scope link  src 10.200.48.121
10.200.49.0/24 dev vlan49  proto kernel  scope link  src 10.200.49.155
On Cisco
show interface port-channel 170
port-channel170 is up
 vPC Status: Up, vPC number: 170
  Hardware: Port-Channel, address: 44d3.cae5.50a2 (bia 44d3.cae5.50a2)
  Description: servername
  MTU 1500 bytes, BW 2000000 Kbit, DLY 10 usec
  reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA
  Port mode is trunk
  full-duplex, 1000 Mb/s
  Input flow-control is off, output flow-control is off
  Switchport monitor is off
  EtherType is 0x8100
  Members in this channel: Eth1/11, Eth3/11
  Last clearing of "show interface" counters never
  52 interface resets
  30 seconds input rate 80 bits/sec, 0 packets/sec
  30 seconds output rate 1832 bits/sec, 2 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 112 bps, 0 pps; output rate 1.94 Kbps, 2 pps
  RX
    380152 unicast packets  113302 multicast packets  3248 broadcast packets
    496720 input packets  88421937 bytes
    0 jumbo packets  0 storm suppression packets
    0 runts  0 giants  0 CRC  0 no buffer
    0 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
Loaded Modules
lsmod | egrep '8021q|loop|bond'
8021q                  19291  0
garp                   13193  1 8021q
bonding                79169  0
loop                   22641  0
Links:
discard packets when the route for outbound traffic differs from the route of incoming traffic
linux_vlan_routing
openvz on debian
ubnutu bug report where I found my answer
bondong on debian
bonding on wheezy
bonding on wheezy
broadcom related post tg3
openvz on wheezy
Conclusion:
When you are making changes via sysctl and you use '-p' to load them don't forget to restart networking or the server. When you are in the thick of it remember to make your changes one step at a time so you can find the problem. Don't assume your first hunch is the answer.

 

Posted by fugit on Fri 17 Feb 2012 at 15:44
Tags: none.

A Happy Friday post about fun one liners. Please post any one liners you have created that probably should have been a short shell script or any improvements for the below one liner. One improvement would be to get rid of the expr output.

Below was something I wrote just to quickly find if I had retired all the hosts that should have been retired from a txt file. I also used a slight variant of the code to check if DNS had been deleted.

UPDATE:
The reason for the expr was I has having problems nesting the if statements inside the backticks. After mcortese comment I updated the script to use an if statement outside of the backticks. RJC provided an alternative to backticks using posix compliant "$()". I have updated the script but did not nest the if statements which would be easier with "$()".

One Liner
for x in `grep KILL list.txt | awk '{ print $6 }'` ; do add=`host $x | grep -v NXDOMAIN | awk '{ print $4 }'` ; if [ "$add" ]; then ping -c1 -W1 $add >/dev/null; expr 1 / $? 2>/dev/null || echo $x ; fi ; done

Example txt from my file below. The servers were all local so I was able to reduce the timeout. In order for the above code to work you will need to be able to lookup the hostname and ping the ip returned:


Text File
KILL 300 62 running 10.200.5.25 woody.example.com
KILL 504 65 running 10.200.5.26 potato.example.org
     505 15 running 10.200.4.27 etch.example.net
KEEP 506 29 running 10.200.3.28 squeeze.example.co.uk
KILL 511 28 running 10.200.3.29 hamm.example.info
KILL 525 30 running 10.200.1.69 bo.example.xxx
KEEP 526 24 running 10.200.2.254 wheezy.example.tv

The Improved One Liner Based on Comments:
for x in $(awk '/KILL/{ print $6 }' /tmp/list.txt ) ; do add=$(host $x | awk '!/NXDOMAIN/{ print $4 }') ; if [ "$add" ]; then ping -c1 -W1 $add >/dev/null; res="$?"; fi; if [ "$res" -eq 0 ] ;then echo $x ; fi ; done

 

Posted by fugit on Fri 5 Aug 2011 at 20:03
The Problem: We ran out of IPs on an internal VLAN used for development environments. I wanted a way to utilze both the old and the new vlans on multiple openvz servers with out losing any of the extra bandwidth provided by bonding. Furthermore I did not want to lose the ability to move VE's between different Hardware nodes.

Update 20110819: Per ian@ianbmacdonald.com comment "The parameters need to be bond_xmit_hash_policy and bond_lacp_rate. ... You can see in the /proc/net/bonding/bond0 output that the policy is set to "layer2" not "layer 2+3" as per the configuration (because of this error)." I have updated the /etc/init.d/interfaces entry below to reflect this.

The Solution:
Setup a new Debian Squeeze Openvz server with bonding (802.3ad) and vlan turnking(802.1Q). This article covers the process of getting vlan and bonding working on Debian Squeeze with a cisco switch running IOS.

Cisco Setup:
Cisco Hardware
We are using a cisco 6509 switch with gigabit ethernet module that supports 802.3ad. For more information regarding the different bonding options you can check out this link I have not tried getting this to work with non 802.3ad (Dynamic link aggregation) capable switch.

Setup the port channel
interface Port-channel30
 description ServerName
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 48,49
 switchport mode trunk
 no ip address
end
Configure the physical interfaces on the cisco switch:
interface GigabitEthernet9/5
 description ServerName#1
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 48,49
 switchport mode trunk
 no ip address
 stack-mib portname ServerName#1
 no snmp trap link-status
 no cdp enable
 channel-protocol lacp
 channel-group 30 mode active
end

interface GigabitEthernet9/19
 description ServerName#2
 switchport
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 48,49
 switchport mode trunk
 no ip address
 stack-mib portname ServerName#2
 no snmp trap link-status
 no cdp enable
 channel-protocol lacp
 channel-group 30 mode active
end
...
Make sure the the "switchport trunk allowed vlan" has the vlans you are going to be doing on the linux server. Until these matched it would not work for me.

Linux Network Config:
Install the required pacakges and load bonding module
apt-get install vlan ifenslave
modprobe bonding
Interfaces Config: /etc/network/interfaces
auto bond0
iface bond0 inet manual
        bond-mode 4
        bond-miimon 100
        bond_xmit_hash_policy layer2+3
        bond_lacp_rate slow
        slaves eth0 eth1 eth2 eth3

auto vlan48
iface vlan41 inet static
        vlan_raw_device bond0
        address 10.169.48.77
        netmask 255.255.255.0
        network 10.169.48.0
        broadcast 10.169.48.255
        gateway 10.169.48.1

auto vlan49
iface vlan49 inet static
        vlan_raw_device bond0
        address 10.169.49.45
        netmask 255.255.255.0
        network 10.169.49.0
        broadcast 10.169.49.255
        gateway 10.169.49.1
If you happen to be using openvz I set the below for /etc/sysctl.conf. I have removed all of the comments and blank lines. You do not need this if you are not using OpenVZ.
egrep -v '^#|^$' /etc/sysctl.conf 
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.conf.eth0.proxy_arp=1
net.ipv4.conf.bond0.proxy_arp=1
net.ipv4.conf.default.forwarding=1 
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.ip_forward=1 
net.ipv4.conf.all.rp_filter = 0
kernel.sysrq = 1
net.ipv4.conf.default.send_redirects = 1
net.ipv4.conf.all.send_redirects = 0
fs.file-max = 100000
sysctl is used on bootup so you need to run the below command to load the file.
/sbin/sysctl -p

Trouble Shooting:
On Linux
ServerName# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 7
        Number of ports: 4
        Actor Key: 17
        Partner Key: 30
        Partner Mac Address: 00:15:2c:79:c4:c0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: d4:85:64:54:1d:5c
Aggregator ID: 7

Slave Interface: eth1
MII Status: up
Link Failure Count: 0
Permanent HW addr: d4:85:64:54:1d:5e
Aggregator ID: 7

Slave Interface: eth2
MII Status: up
Link Failure Count: 1
Permanent HW addr: d4:85:64:54:1d:84
Aggregator ID: 7

Slave Interface: eth3
MII Status: up
Link Failure Count: 1
Permanent HW addr: d4:85:64:54:1d:86
Aggregator ID: 7
ServerName# modinfo bonding 
filename:       /lib/modules/2.6.32-5-openvz-amd64/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis@lbl.gov and many others
description:    Ethernet Channel Bonding Driver, v3.5.0
version:        3.5.0
license:        GPL
srcversion:     C0EFCD8CB4AC214A8146EC2
depends:        
vermagic:       2.6.32-5-openvz-amd64 SMP mod_unload modversions 
parm:           max_bonds:Max number of bonded devices (int)
parm:           num_grat_arp:Number of gratuitous ARP packets to send on failover event (int)
parm:           num_unsol_na:Number of unsolicited IPv6 Neighbor Advertisements packets to send on failover event (int)
parm:           miimon:Link check interval in milliseconds (int)
parm:           updelay:Delay before considering link up, in milliseconds (int)
parm:           downdelay:Delay before considering link down, in milliseconds (int)
parm:           use_carrier:Use netif_carrier_ok (vs MII ioctls) in miimon; 0 for off, 1 for on (default) (int)
parm:           mode:Mode of operation : 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4
for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp)
parm:           primary:Primary network device to use (charp)
parm:           lacp_rate:LACPDU tx rate to request from 802.3ad partner (slow/fast) (charp)
parm:           ad_select:803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2) (charp)
parm:           xmit_hash_policy:XOR hashing method: 0 for layer 2 (default), 1 for layer 3+4 (charp)
parm:           arp_interval:arp interval in milliseconds (int)
parm:           arp_ip_target:arp targets in n.n.n.n form (array of charp)
parm:           arp_validate:validate src/dst of ARP probes: none (default), active, backup or all (charp)
parm:           fail_over_mac:For active-backup, do not set all slaves to the same MAC.  none (default), active or
follow (charp)
On Cisco
show interfaces port-channel 30
Port-channel30 is up, line protocol is up (connected)
  Hardware is EtherChannel, address is 0013.80c0.fa4c (bia 0013.80c0.fa4c)
  Description: Punkinpuss
  MTU 1500 bytes, BW 4000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s
  input flow-control is off, output flow-control is off
  Members in this channel: Gi9/5 Gi9/19 Gi11/45 Gi12/45 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output never, output hang never
  Last clearing of "show interface" counters 4w3d
  Input queue: 0/2000/7/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 34000 bits/sec, 8 packets/sec
  5 minute output rate 120000 bits/sec, 112 packets/sec
     13303252 packets input, 1748466512 bytes, 0 no buffer
     Received 103127 broadcasts (101124 multicasts)
     2 runts, 0 giants, 0 throttles
     5 input errors, 0 CRC, 0 frame, 2 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     111206034 packets output, 42975015356 bytes, 0 underruns
     3 output errors, 0 collisions, 1 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out
Links:
openvz on debian
ubnutu bug report where I found my answer
bondong on debian
bondong on debian in a vmware instance

Conclusion:
I had a hard time finding all of the information required to setup vlan and bonding under squeeze so I put this howto together. Please feel free to post any questions or comments.

 

Posted by fugit on Thu 7 Apr 2011 at 13:47
Tags: none.
The Problem:
Setup selenium to work with firefox on a headless server. I wanted selenium to run as non root and start via init.d.

The Solution:
Install xvfb and firefox via apt and download the selenium jar to /usr/local/selenium. Then setup init scripts for xvfb and selenium.

xvfb
apt-get install xvfb
Setup the init script /etc/init.d/local-xvfb
#!/bin/bash
### CONFIG ###
XPORT=13
USER=selenium
### CONFIG ###

if [ -z "$1" ]; then
echo "`basename $0` {start|stop}"
exit
fi

case "$1" in
start)
su $USER -c "/usr/bin/Xvfb :$XPORT &"
;;

stop)
su $USER  -c "killall Xvfb"
;;
esac
EOF
Test that xvfb init.d is working
/etc/init.d/local-xvfb start
ps aux | grep -i xvfb
/etc/init.d/local-xvfb stop
ps aux | grep -i xvfb
/etc/init.d/local-xvfb start
Setup local-xvfb to start on reboot.
update-rc.d local-xvfb 10 


firefox(iceweasel)
Install iceweasel.
apt-get install iceweasel
That was easy... next.

Selenium
Create the selenium user and download the jar for selenium.
addgroup selenium  
useradd -g selenium  -G selenium selenium 
mkdir /usr/local/selenium 
cd /usr/local/selenium 
wget http://selenium.googlecode.com/files/selenium-server-standalone-2.0b3.jar
chown -R selenium.selenium  /usr/local/selenium 
Setup the init.d/ script to start selenium service on reboot. In order to get this to work as a non-root user the stop section of the script is not great. Please let me know if anyone has a better way and I'll update the script.
Setup the init script /etc/init.d/local-selenium
#!/bin/bash
### CONFIG ### 
# Based on http://robfan.com/post/122618829/continuous-integration-selenium-firefox-flash
SELENIUM_HOME=/usr/local/selenium
LOG_DIR=/var/log/selenium
ERROR_LOG=$LOG_DIR/selenium_error.log
STD_LOG=$LOG_DIR/selenium_std.log
TMP_DIR=$SELENIUM_HOME/tmp
PID_FILE=$TMP_DIR/selenium.pid
JAVA=/usr/bin/java
SELENIUM_APP="$SELENIUM_HOME/selenium-server-standalone-2.0b3.jar"
USER=selenium
### END  CONFIG ### 
case "${1:-''}" in
        'start')
                if test -f $PID_FILE
                then
                  PID=`cat $PID_FILE`
                  if  ps --pid $PID >/dev/null  ;
                  then
                        echo "Selenium is running...$PID"
                        exit 0 
                  else
                        echo "Selenium isn't running..."
                        echo "Removing stale pid file: $PID_FILE"
                  fi
                fi
                  
                echo "Starting Selenium..."
                #echo "COMMAND: su $USER -c \"$JAVA -jar $SELENIUM_APP >$STD_LOG 2>$ERROR_LOG &\""

                su $USER -c "$JAVA -jar $SELENIUM_APP >$STD_LOG 2>$ERROR_LOG &"
                error=$?
                if test $error -gt 0
                then
                        echo "${bon}Error $error! Couldn't start Selenium!${boff}"
                fi
                ps  -C java -o pid,cmd | grep $SELENIUM_APP  | awk {'print $1 '} > $PID_FILE
        ;;
        'stop')
                if test -f $PID_FILE
                then
                        echo "Stopping Selenium..."
                        PID=`cat $PID_FILE`
                        su $USER -c "kill -3 $PID"
                        if kill -9 $PID ;
                                then
                                        sleep 2
                                        test -f $PID_FILE && rm -f $PID_FILE
                                else
                                        echo "Selenium could not be stopped..."
                                fi
                else
                        echo "Selenium is not running."
                fi
                ;;
        'restart')
                if test -f $PID_FILE
                then
                        su $USER -c "kill -HUP `cat $PID_FILE`"
                        test -f $PID_FILE && rm -f $PID_FILE
                        sleep 1
                        su $USER -c "$JAVA -jar $SELENIUM_APP >$STD_LOG 2>$ERROR_LOG &"
                        error=$?
                        if test $error -gt 0
                        then
                                echo "${bon}Error $error! Couldn't start Selenium!${boff}"
                        fi
                        ps  -C java -o pid,cmd | grep $SELENIUM_APP  | awk {'print $1 '} > $PID_FILE

                        echo "Reload Selenium..."
                else
                        echo "Selenium isn't running..."
                fi
                ;;
        'status')
                if test -f $PID_FILE
                then
                  PID=`cat $PID_FILE`
                  if  ps --pid $PID >/dev/null ;
                  then
                        echo "Selenium is running...$PID"
                  else
                        echo "Selenium isn't running..."
                  fi
                else
                        echo "Selenium isn't running..."
                fi
                ;;
        *)      # no parameter specified
                echo "Usage: $SELF start|stop|restart|status"
                exit 1
        ;;
esac
Test that selenium init script is working.
/etc/init.d/local-selenium start
/etc/init.d/local-selenium status
/etc/init.d/local-selenium restart
Setup local-selenium to start on reboot.
update-rc.d local-selenium 95 5


Conclusion
I hope this helps anyone looking to setup selenium to start as a service and not have it running as root.

 

Posted by fugit on Thu 27 Jan 2011 at 19:53
The Problem:
Integrating Debian(lenny) into an Active Directory(2008) forest with multiple trusted domains. We wanted to leverage AD for account management and Authentication including groups. One of the goals was to avoid modifing the accounts in AD. We did not want to enter unix attributes in AD for GID or UID.

The Solution:
Utilizing Samba's winbind, kerberos (krb5), nsswitch and pamd to leverage AD. Deployed and managed via puppet.

winbind
First this does not require a full installation of samba. We are going to only use the winbind portion of samba to make this work. Also I am using the backports version of winbind.
Information on using backports can be found here

Install the required packages for winbind:
apt-get install -t lenny-backports winbind samba-common-bin

Now we need to configure winbind. The file we will modify is /etc/samba/smb.conf. Below will work if you are just using winbind. There are other sections required if you will be using other features of samba.
 
/etc/samba/smb.conf
[global]
   workgroup = WORKGROUP1
   password server = ad1.domain1.com
   realm = DOMAIN1.COM
   security = ads
   template shell = /bin/bash
   winbind offline logon = false
   winbind separator = +

   kerberos method = secrets and keytab
   client ntlmv2 auth = yes
   winbind use default domain = yes
   winbind enum users = yes
   winbind enum groups = yes
   winbind nss info = rfc2307
   idmap config DOMAIN1:backend = rid
   idmap config DOMAIN1:base_rid = 0
   idmap config DOMAIN1:range = 100000 - 199999

   idmap config DOMAIN2:backend = rid
   idmap config DOMAIN2:base_rid = 0
   idmap config DOMAIN2:range = 200000 - 299999


   # Map any users/groups that are not in the trusted domains to this:
   idmap backend = tdb
   idmap uid = 900000-950000
   idmap gid = 900000-950000

   # this is set by default (run testparm to see it)
   passdb backend = tdbsam

  # Refresh kerberos tickets
  winbind refresh tickets = yes

The reason the separator is changed in the above configuration is to allow for many of the unix tools to work with the domain accounts. The regular separator is "/" which do not work with toos such as ssh. If you only have one domain this is not strictly necessary.

kerberos
First install the required packages for kerberos to work. Note samba-common-bin required for "net" command, installed above.
apt-get install krb5-clients krb5-user ntp
Now we need to configure kerberos. This configuration is for a AD Forest with Multiple domains in a trust. If you only have one domain you can remove the parts for multiple domains
/etc/krb.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = DOMAIN1.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 forwardable = yes
 default_keytab_name = FILE:/etc/krb5.keytab

[realms]
DOMAIN1.COM = {
  kdc = ad1.domain1.com:88
  kdc = ad2.domain1.com:88
  admin_server = ad1.domain1:749
  master_kdc = ad1.domain1.com
 }
DOMAIN2.COM = {
  kdc = ad1.domain2.com:88
}

[domain_realm]
 .domain1.com = DOMAIN1.COM
 domain1.com = DOMAIN1.COM
 .domain2.com = DOMAIN2.COM
 domain2.com = DOMAIN2.COM

[appdefaults]
 pam = {
   debug = false
   ticket_lifetime = 36000
   renew_lifetime = 36000
   forwardable = true
   krb4_convert = false
 }
Join your server to the domain:
 
net ads join member -U {administrator}
Test the join:
net ads testjoin
NTP:
ntp is included in the install of kerberos because kerberos is dependent on the time of the severs being correct. It should be pointed to the same ntp server as your AD servers. I'll be happy to be more verbose in the section if anyone has any questions.


nsswitch
nsswitch.conf is the System Databases and Name Service Switch configuration file, that is part of the base-files package in Debian (more information ).
The file /etc/nsswitch.conf needs to be changed to use winbind for passwd, group, and shadow:
 /etc/nsswitch.conf (snippet)
passwd:     files winbind 
group:      files winbind 
shadow:     files winbind 

pamd
pam is the Pluggable Authentication Modules for Linux (more information ).
There are several files we need to change in order to get authentication working with pam they are common-password common-session common-account common-auth. Need more details about each section and why they are changed to ...

common-password
 /etc/pam.d/common-passwd
password  sufficient   pam_unix.so nullok obscure md5
password  sufficient   pam_winbind.so use_first_pass 
password  required     pam_deny.so

common-session
 /etc/pam.d/common-session
session required pam_unix.so
session required pam_mkhomedir.so umask=0022 skel=/etc/skel

common-account
 /etc/pam.d/common-account
account sufficient        pam_winbind.so 
account sufficient        pam_unix.so
account required          pam_deny.so

common-auth Need to explain this section, including why I am using sid as apposed to name or gid. Also how does one get the sid using getent.
 /etc/pam.d/common-auth
auth    sufficient       pam_winbind.so require_membership_of=S-x-x-xx-xxxxxxxx-xxxxxxxxxx-xxxxxxxxxx-2777
auth    sufficient       pam_winbind.so require_membership_of=S-x-x-xx-xxxxxxxx-xxxxxxxxxx-xxxxxxxxxx-1190
auth    sufficient      pam_unix.so nullok_secure use_first_pass
auth    required        pam_deny.so
We are using the SID in common auth because it is a unique identifier as apposed to the rid or group name which are not guaranteed to be unique.

Overview
In order to test you can use getent (man) Using getent you should now be able to find a user on the primary domain "getent user | grep {user}". The results should looke something like below:
getent passwd | grep fugit
fugit:*:101234:100123:Fugit Fugit:/home/DOMAIN1/fugit:/bin/sh

You should be able to run the command "getent group | grep DOMAIN2" and see the AD groups for domain2. You can do the same for users with the command "getent passwd | grep {user}"
 
getent passwd | grep tempus
DOMAIN2+tempus:*:202132:200123:tempus:/home/DOMAIN2/tempus:/bin/bash
In the above section please notice the '+' after the domain. This is needed in order to allow common unix tools such as ssh to work. If you are seeing all of your users but ssh isn't working please ensure you are using a '+' instead of a '/' as the domain separator. Also some trouble shooting.

puppet
I am currently doing all of this via puppet except the "net ads join". I am hoping to be able to provide more details regarding handling this with puppet in the future.

Conclusion
I hope this was helpful to others trying to join linux servers to a Active Directory(2008) forest with multiple trusted domains.

References
battista article
Samba Guide

 

Posted by fugit on Mon 12 Jul 2010 at 21:42
Tags: ,
Currently we use a perl module svn::notify. This is normally combined with svn::notify::config which uses YAML for configuration.



A user requested that we do post-commit rsyncs only when a trigger file is updated. This is a quick blogpost about doing the trigger files.


They did not require any additional security they just wanted to be able to deploy via a trigger file.


Below is the script I used to allow for trigger files:

#!/bin/sh

#CONFIG
DEBUG=""
REPOS="$1" # RESET AFTER GETOPTS
REV="$2" # RESET AFTER GETOPTS
SVNLOOK=/usr/local/bin/svnlook
TRIGGER_PAIRS=":" #spaces between each pair

while getopts "d" optionName; do
case "$optionName" in
d) DEBUG="1";;
[?]) echo "Usage: $0 [-d] "
esac
done
shift $(($OPTIND - 1))

# SET AFTER GETOPTS
REPOS="$1"
REV="$2"

#MAIN
# CHECK FOR TRIGGERED SYNCS
for trigger_pair in $TRIGGER_PAIRS
do
TRIGGER_FILE=`echo $trigger_pair |awk -F: {' print $1 '}`
COMMIT_FILE=`echo $trigger_pair |awk -F: {' print $2 '}`


# DEBUG
if [ -n "$debug" ]
then
echo "TRIGGER_FILE: $TRIGGER_FILE" # DEBUG
echo "TRIGGER_PAIR $trigger_pair" # DEBUG
echo "COMMIT_FILE: $COMMIT_FILE" # DEBUG
fi

if [ ! -z "$( $svnlook changed -r $rev $repos | egrep "$trigger_file" $changed )" ]
then

# DEBUG
if [ -n "$debug" ]
then
echo TRIGGER DEBUG: $REPOS/hooks/${COMMIT_FILE} "$REPOS" "$REV" # DEBUG
fi

$REPOS/hooks/${COMMIT_FILE} "$REPOS" "$REV"
fi
done
# END CHECK FOR TRIGGERED SYNCS #

# NON TRIGGERED SYNCS
# CALL YAML:SVN::NOTIFY SYNC for
$REPOS/hooks/ "$REPOS" "$REV"
# END NON TRIGGERED SYNCS



You don't need all of the debug info but I found it helpful. There is a horrible hack that I was to lazy to fix for re-setting the repo and rev after getopt.


The script can take multiply pairs of trigger files and the path to be rsynced if the trigger file was updated. All of the work is done under main. First the variables are read in and then it checks to see if the trigger file was updated using svnlook.


At the end of the script I call other svn::notify::config setups that get synced with out triggers.

Please post any comments if you would like more info on the trigger script or using svn::notify.

 

Posted by fugit on Fri 30 Apr 2010 at 21:58
THE PROBLEM
We were having an issue where some TLS connections were failing with "SSL_accept error from". There were a couple domains but all microsoft was one of the larger legitimate ones we were having a problem with.

quick answer -> The Answer
Log Entry:
SSL_accept error from smtp.microsoft.com[131.107.115.212]: -1

The problem only started occurring after an upgrade from debian etch to debian Lenny. One server had not been upgraded yet and could successfully handle all mails that the upgrdae servers were getting the "SSL_accept error from". This meant something had changed during the upgrade process that was causing this error. To give you an idea of the scope of the issue we get about 7500 TLS e-mails per day and 7000 were working fine. Only about 500 were failing on the upgraded mail servers and then working on the older etch server.

Setup Details
Here are more details on the different systems. The new servers were running debian Lenny with postfix 2.5.5-1.1 and openssl 0.9.8g-15+Lenny6. The system that hadn't been upgraded and what all the other were running before the upgrade is running debian etch with 2.3.8-2+etch1 0.9.8c-4etch9.

TROUBLE SHOOTING
The first thing I did was to check that my certs were good even though 7000 messages a day were working fine I wanted to double check. Using openssl and the directions at:
http://www.cyberciti.biz/faq/test-ssl-certificates-diagnosis-ssl-certificate/ I confirmed the certs and the fact that ssl was working.

One thing to note while testing TLS from openssl with the following command:
 openssl s_client -starttls smtp -crlf -connect mail2.xxx.com:25  -CApath ~/.cert/mail2.xxx.com -cipher RC4-MD5
I found out about a bug running the s_client within openssl. It is not a perfect client and has some limitations. I have always used all caps when doing trouble shooting in a SMTP conversation. This turned out to be a problem with openssl s_client. Specificly when doing RCPT TO: it kept RENEGOTIATING. You must use "rcpt to:" and NOT "RCPT TO:", the first note of this I could find was at http://archives.neohapsis.com/archives/postfix/2007-01/1334.html. Well it is still a problem.

Why did I force the RC4-MD5 cipher? I had noticed most of the failures were using this cipher, just turned out to be a coincidence.

Ok back to what else I did to try and trouble shot this issue. I made the mistake of trying to turn up TLS logging passed level 2 which wasn't showing me anything. I had done this on a spamtrap server and the logs directory filled in under 2 hours. So this was not a good idea. I forced a log rotate and looked to peer logging. I turned on peer logging for several of the domains that were having the issue. The main.cf entries for postfix are below:
 postfix/main.cf (snippet)
debug_peer_list = microsoft.com, XXX.com, XXX.com
debug_peer_level = 3

After I had turned logging way up for just those domains I did not really see anything on the servers that were having the problems. It still looked like they were just connecting and then dropping off.

At this point I decided to try a brute force approach and upgraded the TLS and postfix packages to squeeze on one of the spamtrap servers. However in order to test it I moved around the MX weights. When I came in the next day it took about 2 minutes to notice it was still having a problem. On the central log server I was monitoring the mail log for anything with TLS or SSL in the line. This gave a good picture of the problem. I quickly saw someone connect to the spamtrap(now with a normal MX record) server get dropped and then switch to the secondary server running etch and have the transaction run with no problems.
tail -f /var/log/mail.log | egrep 'TLS|SSL'
primary1  postfix/smtpd[13528]: setting up TLS connection from mailserver.xxx.com[xx.217.202.16]
primary1  postfix/smtpd[13528]: SSL_accept error from mailserver.xxx.com[xx.217.202.16]: -1
primary1  postfix/smtpd[13528]: lost connection after STARTTLS from mailserver.xxx.com[xx.217.202.16]
primary2  postfix/smtpd[10291]: setting up TLS connection from mailserver.xxx.com[xx.217.202.16]
primary2  postfix/smtpd[10291]: SSL_accept error from mailserver.xxx.com[xx.217.202.16]: -1
primary2  postfix/smtpd[10291]: lost connection after STARTTLS from mailserver.xxx.com[xx.217.202.16]
secondary setting up TLS connection from mailserver.xxx.com[xx.217.202.16]
secondary postfix/smtpd[14665]: TLS connection established from mailserver.xxx.com[xx.217.202.16]: TLSv1 with cipher RC4-MD5 (128/128 bits)


Ok back to the drawing board. After some more searching http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=573748 didn't exactly match but I figured I would try compiling from source and include the extra library call. I was running out of ideas and I was getting desperate. That failed as quickly as the idea of upgrading. At this point I looked at how hard it would be to downgrade postfix and openssl. Well that didn't look fun and didn't provide the answer which was almost as important as fixing the issue.

During this time with the increased logging I was also using ssldump to try and get more information regarding the problem. Both on the working server and the servers that were having a problem.
ssldump -i eth0 -AadnkxX -k /etc/postfix/tls/key.pem 
ssldump -i eth0 -AadnkxX -k /etc/postfix/tls/key.pem > /tmp/ssldump.txt
Again the connections dropped off without any additional information in the logs on the primary servers running lenny and the secondary server reported nothing special.

I didn't include that I had gone to IRC #postfix and #openssl with little luck. One person had suggested it could be a library missmatch. When I first got this answer there was only one domain having the problem, even if it was a very large 3rd party e-mail handler. However as I continued to monitor the problem and noticed more legitimate e-mail servers having the same problem that answer was no longer sufficient. I keep using microsoft as an example as they are very large, legitimate and posting about them having a problem won't hurt anyones feelings.

After revisiting IRC on freenode I failed to get any additional information regarding my problem. After not getting anywhere on freenode I thought I would ask my local friends on irc-debian.org. I asked "anyone want to help trouble shoot postfix openssl tls issue? Its starting to get to me :)"

SOLUTION
Lucky for me dkg volunteered to help look into the situation and we discussed some information back and forth. dkg had another idea "weight 20(secondary etch server) has a smaller handshake than the other ones." The secondary server running etch has a 17KB handshake and the others have a 21KB handshake. I go ahead and do a dpkg-reconfigure ca-certificates to reduce the size of the handshake. I figure the etch server is working so I remove any of the ca roots that are not on the etch server.

Low and behold I tell dkg I owe him a beer as this fixed the problem.

Some interesting notes
The root Authorities removed did not include Microsoft's root Authority and the TLS connections with them were Trusted once the issue was resolved. An other interesting problem that also got resolved was that when I started writing the ssldumps to files I noticed that I was getting handshake errors about the length of the handshake being to short. After getting the root certificates file down below 20k, ~17k that error went away as well.

I wrote all of this in the hopes that I could save someone else 2 weeks of losing their wits and having to find a very obscure solution to a difficult problem.

Please feel free to leave any comments, suggestions, questions or any corrections :)

 

Posted by fugit on Fri 30 Apr 2010 at 21:08
THE PROBLEM
We were having an issue where some TLS connections were failing with "SSL_accept error from". There were a couple domains but all microsoft was one of the larger legitimate ones we were having a problem with.

For the Curious-> The Detailed version

SOLUTION
Lucky for me dkg volunteered to help look into the situation and we discussed some information back and forth. dkg had another idea "weight 20(secondary etch server) has a smaller handshake than the other ones." The secondary server running etch has a 17KB handshake and the others have a 21KB handshake. I go ahead and do a dpkg-reconfigure ca-certificates to reduce the size of the handshake. I figure the etch server is working so I remove any of the ca roots that are not on the etch server.

Low and behold I tell dkg I owe him a beer as this fixed the problem.

Some interesting notes
The root Authorities removed did not include Microsoft's root Authority and the TLS connections with them were Trusted once the issue was resolved. An other interesting problem that also got resolved was that when I started writing the ssldumps to files I noticed that I was getting handshake errors about the length of the handshake being to short. After getting the root certificates file down below 20k, ~17k that error went away as well.

I wrote all of this in the hopes that I could save someone else 2 weeks of losing their wits and having to find a very obscure solution to a difficult problem.

Please feel free to leave any comments, suggestions, questions or any corrections :)

 

Posted by fugit on Sat 26 Nov 2005 at 04:04
Tags: none.
Just saw the weblog posting. Think that is a great idea.

Just had way to much Food again the day after Thanksgiving.

Going to check out some other peoples blogs. Hopefully I'll get motivated and put stuff people might actually be interested in reading, after this week.


Fugit... going to enter a food coma. . .