best way to backup whole node "system" partitions (boot and lvm)?

March 20, 2015, 8:27 am

HI,

I would like to be able to restore a pve node exactly how it is now, taking a full backup (boot and lvm partitions), to an external nas space.
I would prefer a simple way, like booting from sysresccd or another live cd.
I know a few methods (partimage, fsarchiver, dd..) that are easy to use, but I am in doubt for the LVM partition, never used those tools on such partition type.

[edit]
I just saw that booting with sysrescd fsarchiver probe finds dm-0, dm-1 and dm-2, which are root/data/swap pve partitions.

I could probably backup those dm-x "partitions", but if I need to rebuild the node, how should I proceed?
Is there anything documented?
[/edit]

Any suggestions?
Thanks

Marco

↧

Cannot start VM with vRAM >128G

March 20, 2015, 10:19 pm

≫ Next: VMBR1 Dropping Internet Connection

≪ Previous: best way to backup whole node "system" partitions (boot and lvm)?

Said VM does start with 32, 64, 96 GB vRAM but not with 128GB or higher. Here is the output that I'm getting:

Code:

TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=68782ec6-2866-4bfb-88e5-2e4fca313166' -name geostorage -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -m 131072 -k en-us -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=03:00.0,id=hostpci0,bus=pci.0,addr=0x10' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:f83ee4efa04' -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/var/lib/vz/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=76:FE:6D:A4:B0:92,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout

↧

VMBR1 Dropping Internet Connection

March 21, 2015, 4:29 am

≫ Next: proxmox asus eeebox j1900 installation aborted

≪ Previous: Cannot start VM with vRAM >128G

I added another NIC so I am able to have a Windows VM with a "floating" (change the IP address of Windows to have access to physical computers on different subnets instead of making changes to Proxmox and reboot each time) subnet.

While downloading updates for the new VM, the Internet was dropped and wouldn't come back up until I reboot. I tried to download updates again only to have the same thing happen. I decided to just do some browsing the Internet for and about 10 minutes later, the connection dropped.

My /etc/network/interfaces look like this:

Code:

# network interface settings

auto lo

iface lo inet loopback



iface eth0 inet manual



iface eth1 inet manual



iface eth2 inet manual



auto eth3

iface eth3 inet manual



auto bond0

iface bond0 inet static

        address  192.168.3.2

        netmask  255.255.255.0

        slaves eth1 eth2

        bond_miimon 100

        bond_mode 802.3ad



auto vmbr0

iface vmbr0 inet static

        address  192.168.2.2

        netmask  255.255.255.0

        gateway  192.168.1.1

        bridge_ports eth0

        bridge_stp off

        bridge_fd 0



auto vmbr1

iface vmbr1 inet manual

        bridge_ports eth3

        bridge_stp off

        bridge_fd 0

↧

proxmox asus eeebox j1900 installation aborted

March 21, 2015, 7:38 pm

≫ Next: [HOW-TO] Separate migration network - dirty fix

≪ Previous: VMBR1 Dropping Internet Connection

I apologize for any mistake on where to ask or request for support if this is not the way or proper place to place this post but I need some support with installing proxmox v3.4.

scenario:
1) pc EeeBox EB1036-B0534 intel j1900
2) installing from usb flash deive

Problem:
installation start normally and after detecting network (which is mark with a done tag, after a couple of seconds), the next line is

\nInstallation aborted - unable to continue ....

there is not log under tmp or any other information, it also how a previous error earlier in the process: ERROR could not insert 'video': unknown symbol in module...

however if I install in debugging mode and enter the vga=normal parameter and then hit ctrl-d it seems to overcome that video problem ( I don't really know, I am assuming because it continues with starting hotplug event ... and other two lines with green OKs at the beginning), but after detecting network as I mention above it give me the \nInstallation aborted specified above.

any help?

↧

[HOW-TO] Separate migration network - dirty fix

March 22, 2015, 4:19 am

≫ Next: [SOLVED] HELP! Lost root password!!!

≪ Previous: proxmox asus eeebox j1900 installation aborted

Hi all,

On the forum I saw topics about questions how to let Proxmox Cluster use a different network for migration traffic.
Currently for all my clusters I made a code change in the QemuServer.pm to change the listening IP (unfortunately hard-coded) of the migration task.

Currently I have 5 interfaces per hypervisor:

NIC1 and NIC2 (2 x Gigabit Ethernet) are configured in bonding only for public network traffic - no ip address configured - connect to two different gigabit switches.
NIC3 (1 x Gigabit Ethernet) is configured as management traffic. Only for internal traffic - ip address: 10.0.10.XX - connected to gigabit switch.
NIC4 and NIC5 (2 x 10Gigabit Ethernet) are configured in bonding only for storage traffic of DRBD, use of NFS etc - ip address: 10.0.7.XX - connected to a 10G switch.

I know this is not the neatest way. Please note that if you update the Proxmox to a newer version, the changes will be lost.

HOW-TO:

First check the IP of network you would like to use for the migration network. In my case I use the storage network also for my live migration network (ip: 10.0.7.48 as example).
Open the following file in your favourite file editor: /usr/share/perl5/PVE/QemuServer.pm
Search in the file for the variable: $migrate_uri
The result will show the following: $migrate_uri = "tcp:${localip}:${migrate_port}";
By replacing the "${localip}" with the IP you would like to use, Proxmox will be forced to listen on that ip address everytime he receives a migration request.
The result will be as following after the replacement (in my example with use of ip 10.0.7.48): $migrate_uri = "tcp:10.0.7.48:${migrate_port}";
Save the file and make sure it is saved by running the command:
root@hypervisor48:~# cat /usr/share/perl5/PVE/QemuServer.pm | grep migrate_uri
my $migrate_uri;
$migrate_uri = "tcp:10.0.7.48:${migrate_port}";
push @$cmd, '-incoming', $migrate_uri;
print "migration listens on $migrate_uri\n" if $migrate_uri;
Restart the pvedaemon on the server by typing the following:
root@hypervisor48:~# service pvedaemon restart
Restarting PVE Daemon: pvedaemon.
Do the above steps on every node which are listening on that same network.
Test the changes by doing a live migration of a VM. You will see if the changes have worked in the sentence: starting online/live migration on 10.0.7.48:PORT

If you have any questions, please do not hesitate to contact me.

↧

[SOLVED] HELP! Lost root password!!!

March 22, 2015, 2:04 pm

≫ Next: IP of the VM or VZ on Datacerver config

≪ Previous: [HOW-TO] Separate migration network - dirty fix

Hello. I was smart enough to lose my root password for Proxmox. I've already tried to bootup a live cd with Debian and mount on of the drives, but that didnt work probably because im using a zfs volume in raid 1. I've also tried to modify grub by adding init=/bin/bash so it boots in single user mode. That dosent work either because i cant write the '=' sign because i dont have a American keyboard layout but a danish one. So is there anything else i can do other than reinstalling?

Thank you in advance :D

↧

IP of the VM or VZ on Datacerver config

March 22, 2015, 4:03 pm

≫ Next: Very high load of the node

≪ Previous: [SOLVED] HELP! Lost root password!!!

I can configure the IP addresses of the KVM OVZ directly on Proxmox?
On my server (proximox) i have this /etc/network/interfaces:

Code:

auto lo

iface lo inet loopback





# device: eth0

auto  eth0

iface eth0 inet static

  address   176.9.18.135

  broadcast 176.9.18.159

  netmask   255.255.255.224

  gateway   176.9.18.129

  # default route to access subnet

  up route add -net 176.9.18.128 netmask 255.255.255.224 gw 176.9.18.129 eth0





iface eth0 inet6 static

  address 2a01:4f8:150:128d::2

  netmask 64

  gateway fe80::1





auto vmbr0

iface vmbr0 inet static

        address  176.9.204.54

        netmask  255.255.255.248

        gateway  176.9.18.135

        bridge_ports none

        bridge_stp off

        bridge_fd 0

up ip route add 176.9.204.48/29 dev vmbr0

On virtual machines use an IP subnet 176.9.204.48/29
But if I change servers and IP addresses with other, I have to configure each virtual machines?
I want to manage the IP of the VM directly proximox, it's possible?

Today I used the repo no subscription.
I want to activate the subscription, if I find the right product for me.

There is a tutorial to configure the firewall from the web interface?

Thanks.

↧

Very high load of the node

March 23, 2015, 5:26 am

≫ Next: VNC console error

≪ Previous: IP of the VM or VZ on Datacerver config

Hello,
Yesterday I have upgraded to PVE 3.4 from 3.2 and today I have big problems. The load of the node suddenly grew at 12:00. I cant find the cause. Some servers hang with the next messages:
gw.png load1.png load2.png

Code:

# iotop -d 10 -P

Total DISK READ:      21.17 K/s | Total DISK WRITE:       2.82 M/s

  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND

10287 be/4 root        0.00 B/s    0.00 B/s  0.00 % 13.75 % kvm -id 104

 8691 be/4 root        6.33 K/s    8.61 K/s  0.00 % 13.54 % kvm -id 140

 9633 be/4 root        0.00 B/s    0.00 B/s  0.00 % 12.79 % kvm -id 111

10059 be/4 root        0.00 B/s    0.00 B/s  0.00 % 10.82 % kvm -id 156

 8895 be/4 root        0.00 B/s    0.00 B/s  0.00 %  7.17 % kvm -id 117

 9178 be/4 root        0.00 B/s    0.00 B/s  0.00 %  5.01 % kvm -id 119

 9277 be/4 root      405.13 B/s   44.31 K/s  0.00 %  2.08 % kvm -id 108

 7534 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.27 % [txg_sync]

10858 be/4 root      229.47 K/s  752.69 K/s  0.00 %  0.02 % kvm -id 113

12155 be/4 root       30.46 K/s  329.96 K/s  0.00 %  0.01 % kvm -id 116

 8423 be/4 root      810.26 B/s   25.72 K/s  0.00 %  0.01 % kvm -id 112

10481 be/4 root        0.00 B/s    4.35 K/s  0.00 %  0.01 % kvm -id 106

 1083 be/3 root        0.00 B/s 1215.38 B/s  0.00 %  0.00 % [jbd2/dm-0-8]

 2554 be/3 root        0.00 B/s 1620.51 B/s  0.00 %  0.00 % [jbd2/sda4-8]

10076 be/4 root        0.00 B/s    9.50 K/s  0.00 %  0.00 % kvm -id 110

30156 be/4 root        0.00 B/s   11.87 K/s  0.00 %  0.00 % kvm -id 109

29999 be/4 root      405.13 B/s  130.16 K/s  0.00 %  0.00 % kvm -id 153

 9801 be/4 root        0.00 B/s    4.75 K/s  0.00 %  0.00 % kvm -id 131

65437 be/4 root        0.00 B/s    2.77 K/s  0.00 %  0.00 % kvm -id 124

64509 be/4 root        0.00 B/s    3.17 K/s  0.00 %  0.00 % kvm -id 102

11121 be/4 root      810.26 B/s    5.54 K/s  0.00 %  0.00 % kvm -id 130

 9751 be/4 root        0.00 B/s    2.77 K/s  0.00 %  0.00 % kvm -id 129

11169 be/4 root      405.13 B/s  405.13 B/s  0.00 %  0.00 % kvm -id 127

Code:

# zpool iostat -v 10

...

               capacity     operations    bandwidth

pool        alloc   free   read  write   read  write

----------  -----  -----  -----  -----  -----  -----

pool2       2.20T   799G      0     66  32.7K  3.09M

  pve-csv2  2.20T   799G      0     66  32.7K  3.09M

cache           -      -      -      -      -      -

  sdb       55.9G  7.62M      0      1  21.0K   256K

----------  -----  -----  -----  -----  -----  -----

Code:

# iostat -d -x 10

...

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

sda               0.00     1.30    1.10  237.30     5.40  3124.20    26.26     0.02    0.07    6.36    0.04   0.06   1.33

sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sdb               0.00     0.00    2.10    0.00    75.75     0.00    72.14     0.00    0.67    0.67    0.00   0.67   0.14

dm-0              0.00     0.00    0.20    1.70     1.20    13.60    15.58     0.00    1.05   10.00    0.00   1.05   0.20

dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

dm-4              0.00     0.00    0.10   98.30     0.60  2755.00    56.01     0.00    0.05    7.00    0.04   0.05   0.45

Code:

# top

top - 16:11:27 up 1 day,  2:07,  2 users,  load average: 4.94, 5.70, 6.35

Tasks: 1087 total,   1 running, 1086 sleeping,   0 stopped,   0 zombie

%Cpu(s):  2.5 us,  1.4 sy,  0.0 ni, 95.7 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st

MiB Mem:    128840 total,    76732 used,    52107 free,       95 buffers

MiB Swap:    65535 total,        0 used,    65535 free,     4892 cached

 

   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND

 12155 root      20   0 5157m 3.5g 4116 S    16  2.8  12:09.68 kvm

 29999 root      20   0 9231m 7.8g 3900 S    15  6.2  49:53.35 kvm

  9801 root      20   0 4852m 4.1g 3960 S    15  3.3 465:40.06 kvm

 64509 root      20   0 10.8g  10g 3972 S    10  8.0 245:27.25 kvm

 11169 root      20   0 1406m 1.0g 3772 S     8  0.8 114:50.26 kvm

  8423 root      20   0 3676m 3.1g 3808 S     6  2.5 113:30.77 kvm

 10858 root      20   0 9313m 5.2g 3788 S     5  4.2  89:14.78 kvm

Code:

# pveversion --verbose

proxmox-ve-2.6.32: 3.3-147 (running kernel: 3.10.0-1-pve)

pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)

pve-kernel-3.10.0-1-pve: 3.10.0-5

pve-kernel-2.6.32-28-pve: 2.6.32-124

pve-kernel-2.6.32-37-pve: 2.6.32-147

lvm2: 2.02.98-pve4

clvm: 2.02.98-pve4

corosync-pve: 1.4.7-1

openais-pve: 1.1.4-3

libqb0: 0.11.1-2

redhat-cluster-pve: 3.2.0-2

resource-agents-pve: 3.9.2-4

fence-agents-pve: 4.0.10-2

pve-cluster: 3.0-16

qemu-server: 3.3-20

pve-firmware: 1.1-3

libpve-common-perl: 3.0-24

libpve-access-control: 3.0-16

libpve-storage-perl: 3.0-31

pve-libspice-server1: 0.12.4-3

vncterm: 1.1-8

vzctl: 4.0-1pve6

vzprocps: 2.0.11-2

vzquota: 3.1-2

pve-qemu-kvm: 2.1-12

ksm-control-daemon: 1.1-1

glusterfs-client: 3.5.2-1

Code:

# pveperf

CPU BOGOMIPS:      110201.04

REGEX/SECOND:      921999

HD SIZE:           62.87 GB (/dev/mapper/pve-root)

BUFFERED READS:    499.85 MB/sec

AVERAGE SEEK TIME: 9.07 ms

FSYNCS/SECOND:     4271.21

Code:

# pveperf /pool2/VMs/images/

CPU BOGOMIPS:      110201.04

REGEX/SECOND:      942748

HD SIZE:           2970.82 GB (pool2/VMs)

FSYNCS/SECOND:     4683.57

Code:

# pveperf /mnt/sda4/images/

CPU BOGOMIPS:      110201.04

REGEX/SECOND:      923666

HD SIZE:           3023.67 GB (/dev/sda4)

BUFFERED READS:    349.94 MB/sec

AVERAGE SEEK TIME: 9.98 ms

FSYNCS/SECOND:     2474.56

Code:

# qm list | grep -v stopped

      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID

       102 server102            running    10240             50.00 64509

       104 server104            running    512                2.00 10287

       106 server106            running    2048             300.00 10481

       108 server108            running    2048               4.00 9277

       109 server109            running    4096              50.00 30156

       110 server110            running    1024             150.00 100765

       111 server111            running    2048             100.00 9633

       112 server112            running    3072              32.00 8423

       113 server113            running    8192              48.00 10858

       115 server115            running    1024              50.00 10631

       116 server116            running    4096              32.00 12155

       117 server117            running    2048               4.00 8895

       119 server119            running    1024              16.00 9178

       122 server122            running    1024              50.00 10779

       124 server124            running    3072              40.00 65437

       127 server127            running    1024              30.00 11169

       128 server128            running    2048               8.00 11283

       129 server129            running    4096              50.00 9751

       130 server130            running    1024              50.00 11121

       131 server131            running    4096              40.00 9801

       140 server140            running    1024             150.00 8691

       153 server153            running    8192             150.00 29999

       156 server156            running    2048             150.00 10059

Thanks.

Attached Images

gw.png (47.3 KB)
load1.png (30.7 KB)
load2.png (17.6 KB)

↧

VNC console error

March 23, 2015, 5:53 am

≫ Next: No Quorum after Update

≪ Previous: Very high load of the node

Hi.
I'm experiencing a lot of problems on the VNC console.
If I try to open the VNC console of a KVM virtual machine I get this error:

Failed to connect to server (code: 1006).

This happens on every PVE nodes and from Chrome, Safari and Firefox, and on all of my virtual machines.
The virtual machine is running, of course.

From the tasks log I see the following:

Code:

TASK ERROR: command '/bin/nc -l -p 5900 -w 10 -c '/usr/bin/ssh -T -o BatchMode=yes 192.168.60.1 /usr/sbin/qm vncproxy 101 2>/dev/null'' failed: exit code 255

If I execute this command from the console and try to telnet to the port 5900 of the node the connection works:

Code:

root@node1:~# /bin/nc -l -p 5900 -w 10 -c '/usr/bin/ssh -T -o BatchMode=yes 192.168.60.1 /usr/sbin/qm vncproxy 101'

MyClient:~ mattia$ telnet 192.168.60.1 5900
Trying 192.168.60.1...
Connected to 192.168.60.1.
Escape character is '^]'.
RFB 003.008
[/code]

My PVE cluster is updated:

Code:

root@node1:~# pveversion -v

proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)

pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)

pve-kernel-2.6.32-37-pve: 2.6.32-150

pve-kernel-2.6.32-26-pve: 2.6.32-114

lvm2: 2.02.98-pve4

clvm: 2.02.98-pve4

corosync-pve: 1.4.7-1

openais-pve: 1.1.4-3

libqb0: 0.11.1-2

redhat-cluster-pve: 3.2.0-2

resource-agents-pve: 3.9.2-4

fence-agents-pve: 4.0.10-2

pve-cluster: 3.0-16

qemu-server: 3.4-3

pve-firmware: 1.1-4

libpve-common-perl: 3.0-24

libpve-access-control: 3.0-16

libpve-storage-perl: 3.0-32

pve-libspice-server1: 0.12.4-3

vncterm: 1.1-8

vzctl: 4.0-1pve6

vzprocps: 2.0.11-2

vzquota: 3.1-2

pve-qemu-kvm: 2.2-8

ksm-control-daemon: 1.1-1

glusterfs-client: 3.5.2-1

Could you help me please?

↧

No Quorum after Update

March 23, 2015, 11:13 am

≫ Next: Ceph - Bad performance with small IO

≪ Previous: VNC console error

Hello,

After updating my 2 proxmox node, i restarted the first one. Now i wanted to move the vms to the first node and restart the second one.

Unfortunately the quorum ist lost and i cannot move any vms.

I tried to restart the services pvedaemon, pvestatd, pveproxy, pve-cluster

Is it ok, if i just set expected votes to 1 and then migrate the vms? Will that work?

Here are some outputs from the 2 nodes, that might help:

Node1

Code:

root@groemer01 ~ $ pvecm node

Node Sts Inc Joined Name

1 M 1464 2015-03-23 08:12:29 groemer01

2 X 1468 groemer02

root@groemer01 ~ $ pvecm status

Version: 6.2.0

Config Version: 22

Cluster Name: tsccluster

Cluster Id: 49332

Cluster Member: Yes

Cluster Generation: 1472

Membership state: Cluster-Member

Nodes: 1

Expected votes: 2

Total votes: 1

Node votes: 1

Quorum: 2 Activity blocked

Active subsystems: 6

Flags:

Ports Bound: 0 177

Node name: groemer01

Node ID: 1

Multicast addresses: x.x.x.x

Node addresses: x.x.x.x

root@groemer01 ~ $ cat /etc/pve/cluster.conf

<?xml version="1.0"?>

<cluster config_version="22" name="tsccluster">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>

<fencedevices>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" p ower_wait="5"/>

</fencedevices>

<clusternodes>

<clusternode name="groemer01" nodeid="1" votes="1">

<fence>

<method name="1">

<device name="ipmi1"/>

</method>

</fence>

</clusternode>

<clusternode name="groemer02" nodeid="2" votes="1">

<fence>

<method name="1">

<device name="ipmi2"/>

</method>

</fence>

</clusternode>

</clusternodes>

<rm>

<pvevm autostart="1" vmid="106"/>

</rm>

</cluster>

root@groemer01 ~ $ cat /etc/cluster/cluster.conf

<?xml version="1.0"?>

<cluster config_version="22" name="tsccluster">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>

<fencedevices>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" p ower_wait="5"/>

</fencedevices>

<clusternodes>

<clusternode name="groemer01" nodeid="1" votes="1">

<fence>

<method name="1">

<device name="ipmi1"/>

</method>

</fence>

</clusternode>

<clusternode name="groemer02" nodeid="2" votes="1">

<fence>

<method name="1">

<device name="ipmi2"/>

</method>

</fence>

</clusternode>

</clusternodes>

<rm>

<pvevm autostart="1" vmid="106"/>

</rm>

</cluster>

Node2

Code:

root@groemer02:~# pvecm status

Version: 6.2.0

Config Version: 22

Cluster Name: tsccluster

Cluster Id: 49332

Cluster Member: Yes

Cluster Generation: 1472

Membership state: Cluster-Member

Nodes: 1

Expected votes: 2

Total votes: 1

Node votes: 1

Quorum: 2 Activity blocked

Active subsystems: 6

Flags:

Ports Bound: 0

Node name: groemer02

Node ID: 2

Multicast addresses: x.x.x.x

Node addresses: x.x.x.x

root@groemer02:~# cat /etc/pve/cluster.conf

<?xml version="1.0"?>

<cluster config_version="22" name="tsccluster">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>

<fencedevices>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" power_wait="5"/>

</fencedevices>

<clusternodes>

<clusternode name="groemer01" nodeid="1" votes="1">

<fence>

<method name="1">

<device name="ipmi1"/>

</method>

</fence>

</clusternode>

<clusternode name="groemer02" nodeid="2" votes="1">

<fence>

<method name="1">

<device name="ipmi2"/>

</method>

</fence>

</clusternode>

</clusternodes>

<rm>

<pvevm autostart="1" vmid="106"/>

</rm>

</cluster>

root@groemer02:~# cat /etc/cluster/cluster.conf

<?xml version="1.0"?>

<cluster config_version="22" name="tsccluster">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>

<fencedevices>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>

<fencedevice agent="fence_ipmilan" ipaddr="x.x.x.x" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" power_wait="5"/>

</fencedevices>

<clusternodes>

<clusternode name="groemer01" nodeid="1" votes="1">

<fence>

<method name="1">

<device name="ipmi1"/>

</method>

</fence>

</clusternode>

<clusternode name="groemer02" nodeid="2" votes="1">

<fence>

<method name="1">

<device name="ipmi2"/>

</method>

</fence>

</clusternode>

</clusternodes>

<rm>

<pvevm autostart="1" vmid="106"/>

</rm>

</cluster>

omping works as well.

↧

Ceph - Bad performance with small IO

March 23, 2015, 12:21 pm

≫ Next: ProxMox 3.4 on Blade HP C7000 with fiber Storage

≪ Previous: No Quorum after Update

Hello everyone,

first of all I want to say thank you to each and everyone in this community!
I've been a long time reader ( and user of pve ) and could get so much valuable information from this forum!

Right now the deployment of the Ceph Cluster gives me some trouble.
We were using DRBD but since we are expanding and the are more nodes in the pve-cluster we decided to switch to Ceph.

The 3 Ceph-Server-Nodes are connected via a 6*GbE-LACP-Bond with Jumbo-Frames over two stacked switches and the Ceph traffic is on a seperate VLAN.
Currently there are 9 OSDs (3*15K SAS with BBWC per host).
The journal is 10GB per OSD and on LVM-Volumes of a SSD-RAID1.
pg_num and pgp_num are set to 512 for the pool.
Replication is 3 and the CRUSH-Map is configured to distribute the requests over the 3 hosts.

The performance of the rados benchmarks is good:
rados -p test bench 60 write -t 8 --no-cleanup

Code:

Total time run:         60.187142

Total writes made:      1689

Write size:             4194304

Bandwidth (MB/sec):     112.250 



Stddev Bandwidth:       48.3496

Max bandwidth (MB/sec): 176

Min bandwidth (MB/sec): 0

Average Latency:        0.28505

Stddev Latency:         0.236462

Max latency:            1.91126

Min latency:            0.053685

rados -p test bench 60 seq -t 8

Code:

Total time run:        30.164931

Total reads made:      1689

Read size:             4194304

Bandwidth (MB/sec):    223.969 



Average Latency:       0.142613

Max latency:           2.78286

Min latency:           0.003772

rados -p test bench 60 rand -t 8

Code:

Total time run:        60.287489

Total reads made:      4524

Read size:             4194304

Bandwidth (MB/sec):    300.162 



Average Latency:       0.106474

Max latency:           0.768564

Min latency:           0.003791

What makes me wonder is the "Min bandwidth (MB/sec): 0" and "Max latency: 1.91126" at write - benchmark.

I've modified the Linux autotuning TCP buffer limits and the rx/tx ring parameters of the Network-Cards (all Intel), which increased the bandwidth, but didn't help with the latency of small IO.

For example in a wheezy-kvm-guest:

Code:

dd if=/dev/zero of=/tmp/test bs=512 count=1000 oflag=direct,dsync

512000 Bytes (512 kB) kopiert, 9,99445 s, 51,2 kB/s



dd if=/dev/zero of=/tmp/test bs=4k count=1000 oflag=direct,dsync

4096000 Bytes (4,1 MB) kopiert, 10,0949 s, 406 kB/s

I also did put flashcache in front of the OSDs but this didn't help much and since there's 1GB of Cache from the RAID-Controller in front of the OSDs I wonder why this is so slow in the guests?
Compared to the raw performance of the SSDs and the OSDs this is realy bad...

Code:

dd if=/dev/zero of=/var/lib/ceph/osd/ceph-2/test bs=512 count=1000 oflag=direct,dsync

512000 Bytes (512 kB) kopiert, 0,120224 s, 4,3 MB/s



dd if=/dev/zero of=/var/lib/ceph/osd/ceph-2/test bs=4k count=1000 oflag=direct,dsync

4096000 Bytes (4,1 MB) kopiert, 0,137924 s, 29,7 MB/s





dd if=/dev/zero of=/mnt/ssd-test/test bs=512 count=1000 oflag=direct,dsync

512000 Bytes (512 kB) kopiert, 0,147097 s, 3,5 MB/s



dd if=/dev/zero of=/mnt/ssd-test/test bs=4k count=1000 oflag=direct,dsync

4096000 Bytes (4,1 MB) kopiert, 0,235434 s, 17,4 MB/s

Running fio from a node directly via rbd gives expected results, but also with some serious deviations:

Code:

rbd_iodepth32: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32

fio-2.2.3-1-gaad9

Starting 1 process

rbd engine: RBD version: 0.1.8

Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/13271KB/0KB /s] [0/3317/0 iops] [eta 00m:00s]

rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=849098: Mon Mar 23 20:08:25 2015

  write: io=2048.0MB, bw=12955KB/s, iops=3238, runt=161874msec

    slat (usec): min=37, max=27268, avg=222.48, stdev=326.17

    clat (usec): min=13, max=544666, avg=7937.85, stdev=11891.77

     lat (msec): min=1, max=544, avg= 8.16, stdev=11.88

Thanks for reading so far :-)
I know this is my first post, but I have really run out of options here and would really appreciate your help.

My question are:
Why is the performance in the guests so much worse?
What can we do to enhance this for Linux as well as Windows guests?

Thanks for reading this big post and I hope we can have a nice discussion with a good outcome for everyone, since this is, in my point of view a common issue for a few users.

↧

ProxMox 3.4 on Blade HP C7000 with fiber Storage

March 23, 2015, 12:25 pm

≫ Next: Proxmox 3.4, OVS and Open VZ - 2 nics, 2 subnets

≪ Previous: Ceph - Bad performance with small IO

Hi,
I have HP Blade system C7000 with 16 bays(every bay has 8RAM and 2x Quad core CPU , 1x hdd 72G 10k)
In the first two bays i have raid controler , Those bays are connected via optical fiber with a storage (14xHDD10k- 2x fiber output A nad B output)

At bays 3-15 i run ProxMox 3.4. (zfs raid 0)

I have 2 questions:

1)What OS i shall use at the first 2 bays so that the other bays(3-16 with ProxMox) would recognise them as storage..?

2)should i do cluster or HA or ceph.. Do you have a good setup to recoment??

↧

Proxmox 3.4, OVS and Open VZ - 2 nics, 2 subnets

March 23, 2015, 9:45 pm

≫ Next: Turnkey OpenVPN in a container, can't make it work.

≪ Previous: ProxMox 3.4 on Blade HP C7000 with fiber Storage

Hello,

I tried a while ago to get this working with Proxmox 3.3 but could never get things working right. I've just installed 3.4 and was hoping someone had a good example of using OVS and OpenVZ with two nics.

I have two nics, each with their own subnet and gateway, would like to be able to connect to the GUI/SSH on the host, and don't want the two subnets to have access to each other as this will be handled by the upstream firewall.

Would it be best to have an additional NIC for the GUI/SSH? How do I properly setup OVS for this type of configuration? Any help would be appreciated as I'd like to migrate to Proxmox soon.

Thanks,
Mike

↧

Turnkey OpenVPN in a container, can't make it work.

March 24, 2015, 1:39 am

≫ Next: ZFS over iSCSI doesn't work

≪ Previous: Proxmox 3.4, OVS and Open VZ - 2 nics, 2 subnets

I hope someone can give me a hand on this.
I installed the OpenVPN turnkey in a container, configuring it as server and bridged network. I can access the web manager and ssh, but whenever I start the container, the 1194 port appears closed to my requests (before opened but not responding). And nothing seems to work for having it available.

Can someone please give a hand. Or sending me to a good guide on installing openvpn in proxmox?
Thanks in advance.

↧

ZFS over iSCSI doesn't work

March 24, 2015, 1:42 am

≫ Next: The difference between Quorum and fencing

≪ Previous: Turnkey OpenVPN in a container, can't make it work.

Hi,

I'am trying to setup a ZFS over iSCSI configuration on Proxmox 3.4.
I created the ZFS pool on the server (Ubuntu 12.04) and the iSCSI target (IET).
Config files:
IET:

Code:

Target iqn.2014-10.proxmoxhoz.server:proxmoxzfs01

    Lun 0 Path=/dev/zvol/zfspool01,Type=blockio

I'am sure this is not 100% correct because:

Code:

iscsi_trgt: blockio_open_path(167) Can't open device /dev/zvol/zfspool01, error -15

kernel: [61709.931807] iscsi_trgt: blockio_attach(294) Error attaching Lun 0 to Target iqn.2014-10.proxmoxhoz.server:proxmoxzfs01

ietd: unable to create logical unit 0 in target 3: 15

Anywa i realized this error later after tried to find out what could be the problem.

Befora that i tried to create a KVM machine.
storage.cfg

Code:

zfs: 

ZFS01

blocksize 4k

target iqn.2014-10.proxmoxhoz.server:proxmoxzfs01

pool zfspool01

iscsiprovider iet

portal 172.18.99.199

content images

nowritecache

So, when i want to create a new KVM machine it dies with this error message:

Code:

TASK ERROR: create failed - 137: Parse error [    Lun 0 Path=/zfspool01] at /usr/share/perl5/PVE/Storage/LunCmd/Iet.pm line 175. 

or (mostly)

TASK ERROR: create failed - 138: Parse error [   ] at /usr/share/perl5/PVE/Storage/LunCmd/Iet.pm line 189.

When i check my zfs, the disk is created.

Are you have any idea what could cause the problem?

Thanks, Robert

↧

The difference between Quorum and fencing

March 24, 2015, 4:20 am

≫ Next: Conversion of vmdk to qcow2. Is this correct?

≪ Previous: ZFS over iSCSI doesn't work

Hello,

Can you please explain me the difference between quorum and fencing ?
Can I configure a cluster Proxmox V3.4 with 2 nodes (HA) and VM Quorum without Fencing?
To explain my configuration

I have configure cluster proxmox with 2 nodes + one quorum
When i try to simulate a fail network with service networking stop, just to verfy if my VM will migrate to the other node. I have this error: fence xxxxxx dev 0.0 agent none result: error no method
I just activate the fencing in the /etc/default/redhat-cluster-pve

In the case of, the fencing must be configured to ensure the HA, can I used the same network interface for fencing and to access to the node?

Thanks for your help ;)

↧

Conversion of vmdk to qcow2. Is this correct?

March 24, 2015, 6:49 am

≫ Next: Ceph - osd "Problem" - on two nodes are the same osdid´s

≪ Previous: The difference between Quorum and fencing

Hello,

I am trying to convert a vmdk disk file from vmware to proxmox
I used the details on this link (https://pve.proxmox.com/wiki/Migrati...x_VE_.28KVM.29 ) however I am not sure if the conversion was correct.
I used the command: qemu-img convert -f vmdk original.vmdk -O qcow2 vm-108-disk-1.qcow2

After the conversion was done I run: qemu-img info and it shows:

image: vm-108-disk-1.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 3.3G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false

Also the command file gives:
qcow2 vm-108-disk-1.qcow2: QEMU QCOW Image (unknown version)

Do those seem to be OK or not?

Any help is welcome! :rolleyes:

↧

Ceph - osd "Problem" - on two nodes are the same osdid´s

March 24, 2015, 9:16 am

≫ Next: 1501 length packages - problem with MTU on virtual pfSense (Proxmox)

≪ Previous: Conversion of vmdk to qcow2. Is this correct?

Hi to all!

I have a three node cluster, with a ceph storage. It works perfect now.
There are three Servers, every server has 8 harddisks. 1 for Proxmox, 7 for ceph storage.
Now my problem. Normally, there is only one osdid name for the whole three node cluster. I changed in the past harddrives, formated them right and now there are two osds with the same osdid, but only on pve2 it is shown at the Proxmox GUI.
On pve1 and pve2 there is the same osdid 9:

HTML Code:

root@pve1:~# df -h

Filesystem                     Size  Used Avail Use% Mounted on

udev                            10M     0   10M   0% /dev

tmpfs                          1.4G  492K  1.4G   1% /run

/dev/mapper/pve-root            34G  1.5G   31G   5% /

tmpfs                          5.0M     0  5.0M   0% /run/lock

tmpfs                          2.8G   59M  2.7G   3% /run/shm

/dev/mapper/pve-data            73G  180M   73G   1% /var/lib/vz

/dev/fuse                       30M   24K   30M   1% /etc/pve

/dev/cciss/c0d6p1              132G   48G   85G  36% /var/lib/ceph/osd/ceph-9

/dev/cciss/c0d2p1              132G   30G  103G  23% /var/lib/ceph/osd/ceph-1

/dev/cciss/c0d4p1              132G   25G  108G  19% /var/lib/ceph/osd/ceph-7

/dev/cciss/c0d5p1              132G   21G  112G  16% /var/lib/ceph/osd/ceph-8

/dev/cciss/c0d7p1              132G   29G  104G  22% /var/lib/ceph/osd/ceph-10

/dev/cciss/c0d3p1              132G   24G  108G  19% /var/lib/ceph/osd/ceph-2

/dev/cciss/c0d1p1              132G   23G  109G  18% /var/lib/ceph/osd/ceph-0





root@pve2:~# df -h

Filesystem                     Size  Used Avail Use% Mounted on

udev                            10M     0   10M   0% /dev

tmpfs                         1000M  492K  999M   1% /run

/dev/mapper/pve-root            34G  2.0G   30G   7% /

tmpfs                          5.0M     0  5.0M   0% /run/lock

tmpfs                          2.0G   59M  1.9G   3% /run/shm

/dev/mapper/pve-data            77G  180M   77G   1% /var/lib/vz

/dev/fuse                       30M   24K   30M   1% /etc/pve

/dev/cciss/c0d5p1              132G   30G  102G  23% /var/lib/ceph/osd/ceph-12

/dev/cciss/c0d3p1              132G   22G  111G  17% /var/lib/ceph/osd/ceph-6

/dev/cciss/c0d6p1              132G   18G  115G  13% /var/lib/ceph/osd/ceph-9

/dev/cciss/c0d7p1              132G   20G  112G  16% /var/lib/ceph/osd/ceph-13

/dev/cciss/c0d2p1              132G   20G  112G  15% /var/lib/ceph/osd/ceph-4

/dev/cciss/c0d4p1              132G   23G  109G  18% /var/lib/ceph/osd/ceph-11

/dev/cciss/c0d1p1              132G   19G  114G  14% /var/lib/ceph/osd/ceph-3

How can fix that problem to use the harddrivedisk at pve1 (osdid9)?

The command (on pve1) "pveceph destroy osd 9" don´t work:

HTML Code:

root@pve1:~# pveceph destroyosd 9

osd is in use (in == 1)

root@pve1:~#

Did anyone had the same "problem" in the past?

Thanks in advance,

roman

↧

1501 length packages - problem with MTU on virtual pfSense (Proxmox)

March 24, 2015, 10:54 am

≫ Next: Some help new to proxmox nat

≪ Previous: Ceph - osd "Problem" - on two nodes are the same osdid´s

Disclaimer: I posted this on the pfSense boards, too, as I don't know wether this is more a Proxmox or pfSense issue

Hello all,

I'm running into some strange problems with too large packets on our WAN interface.

Setup:

- pfSense 2.2 64Bit on Proxmox 3.4 host, 2 cores, 4GB RAM, CPU max 5%
- HW NIC eth1 => WAN, MTU 1500
- HW NIC eth4 = > LAN, MTU 9000
- HW NIC eth2 => LAN, connected to same switch, but not active
- vmbr0, OVS Bridge => eth4 => LAN
- vmbr1, OVS Bridge => eth1 => WAN
- Jumbo Frames on switches enabled
- pfSense MTU WAN If.: 1500
- Clear invalid DF bits instead of dropping the packets: Enabled
- Disable hardware checksum offload: Enabled
- Disable hardware TCP segmentation offload: Enabled
- Disable hardware large receive offload: Enabled
- All other local if's on 9000 MTU
- Storage cluster (Synology): 9000 MTU
- VMs on all proxmox hosts: Default MTU 1500

Log on Proxmox hosts tells me:

Code:

...

Mar 24 18:40:46 vmhost1 kernel: __ratelimit: 6 callbacks suppressed

Mar 24 18:40:46 vmhost1 kernel: openvswitch: tap108i7: dropped over-mtu packet: 1501 > 1500

Mar 24 18:40:46 vmhost1 kernel: openvswitch: tap108i7: dropped over-mtu packet: 1501 > 1500

Mar 24 18:40:46 vmhost1 kernel: openvswitch: tap108i7: dropped over-mtu packet: 1501 > 1500

Mar 24 18:40:46 vmhost1 kernel: openvswitch: tap108i7: dropped over-mtu packet: 1501 > 1500

Mar 24 18:40:46 vmhost1 kernel: openvswitch: tap108i7: dropped over-mtu packet: 1501 > 1500

Mar 24 18:40:46 vmhost1 kernel: openvswitch: tap108i7: dropped over-mtu packet: 1501 > 1500

...

tap108i7 is the OVS bridge on the Proxmox host for WAN If. (vtnet7).

I did some package capturing showing that large packets on the WAN interface come from an virtual IP, i.e. inside the network:

Code:

Id = 12

Source = 217.76.xxx.xx

Destination = 7x.x.x.xxx

Captured Length = 1506

Packet Length = 1506

Protocol = TCP

Date Received = 2015-03-24 17:28:54 +0000

Time Delta = 0.00888514518737793

Information = HTTP -> 58826 ([ACK], Seq=4188548632, Ack=3381854676, Win=243)

The source IP is a public IP from our public pool currently NATing to a VM on another proxmox host on the same network.
Destination is some random public IP (not ours).

Any ideas why these large packages are beeing generated? Where do they come from? How do I stop them?

The VMs "behind" the pfSense are on multiple vlans, each having their own DHCP server. The VLANs are created on the switches and assigned to the pfSense's virtual NICs. Should I set the VMs MTU to 9000, too, as they are on the local networks (the public IP's are NATed on the pfSense and not directly connected to the VM)?

Thanks
Sebastian

↧

Some help new to proxmox nat

March 24, 2015, 1:04 pm

≫ Next: Poor performance with ZFS

≪ Previous: 1501 length packages - problem with MTU on virtual pfSense (Proxmox)

hi there, new to this scheeme, how can we join a new proxmox node that is behind nat ex 10.1.0.163 with a public 190.210.XXX.XXX ip to an other node that is in another datacenter with a public ip. when we run in the new node pvcem add 190.210.XXX.XXX it stops at quorum . any idea ? thanks

↧