PHP SimpleXMLElement 对象

今天在用RedbeanPHP 写数据库接口的时候,发现从SimpleXMLElement对象里面出来的值,按理说都是string, 怎么也不能赋值给RedbeanPHP, 用var_dump看了一下,发现SimpleXMLElement对象比较有意思,里面嵌套的对象以及property 的type 都是object,而且都是SimpleXMLElement 对象,在google上搜了一下发现问这个问题的不少。

https://stackoverflow.com/questions/416548/forcing-a-simplexml-object-to-a-string-regardless-of-context

比如说XML是这样的:

<channel>
<item>
<title>This is title 1</title>
</item>
</channel>

下面这样确实能够输出string:

$xml = simplexml_load_string($xmlstring);
echo $xml->channel->item->title;

但是除了echo 以外,下面这样就不能被当成string了

$foo = array( $xml->channel->item->title );

这是因为$XML->channel->item->title 的type 其实仍然为SimpleXMLElement的对象

我们可以用typecast来解决这个问题:

$foo = array( (string) $xml->channel->item->title );

The above code internally calls __toString() on the SimpleXMLObject. This method is not publicly available, as it interferes with the mapping scheme of the SimpleXMLObject, but it can still be invoked in the above manner.

另外看到有人这么写也可以:

$foo = array( $xml->channel->item->title.'' );

通过gettype查看确实变成了string,具体原理不知道

利用AWK来分析nginx日志

分析nginx日志有各种各样的可视化工具,但是这样比较繁琐,需要安装和配置,大部分的时候我们只需要简单的分析,这里awk 完全可以满足我们的需求。

  1. 统计日志中访问最多的10个ip

 方法一

awk '{a[$1]++}END{for(i in a)print a[i],i|"sort -k1 -nr|head -n10"}' access.log

方法二

awk '{print $1}' access.log |sort |uniq -c |sort -k1 -nr |head -n10

 

2. 统计日志中访问大于100次的IP

方法一

awk '{a[$1]++}END{for(i in a){if(a[i]>100)print i,a[i]}}' access.log

方法二

awk '{a[$1]++;if(a[$1]>100){b[$1]++}}END{for(i in b){print i,a[i]}}' access.log

3. 统计2019年12月24日一天内访问最多的10个IP

方法一

awk '$4>="[24/Dec/2019:00:00:01" && $4<="[24/Dec/2019:23:59:59" {a[$1]++}END{for(i in a)print a[i],i|"sort -k1 -nr|head -n10"}' access.log

方法二

sed -n '/\[24\/Dec\/2019:00:00:01/,/\[24\/Dec\/2019:23:59:59/p' access.log |sort |uniq -c |sort -k1 -nr |head -n10

4. 统计访问最多的前10个页面 ($request)

awk '{a[$7]++}END{for(i in a)print a[i],i|"sort -k1 -nr|head -n10"}' access.log

5. 统计蜘蛛抓取次数

grep 'Baiduspider' access.log |wc -l

统计蜘蛛抓取404的次数

grep 'Baiduspider' access.log |grep '404' | wc -l

 

 

Cloudflare 配合nginx 来禁止某些国家, user agent 的访问

其实目前cloudflare 的免费版已经在control panel 里面可以完美的实现这些功能,但是cloudflare 能做到的只是禁止某些国家或者user agent 的访问,如果我们想更好的优化这些流量,比如说对不同的geo跳转到不同的网站,或者对于某些蜘蛛来说显示不同的网站,那么就需要nginx 的配合了, 在这里我们主要利用的是nginx 的map 这个功能.

对于geo 的控制

如果你的网站在cloudflare 的保护下,那么cloudflare 默认会在header里面加上’HTTP_CF_IPCOUNTRY’, 这就相当于cloudflare 提供了免费的IP数据库,利用nginx 的map 功能,比如说禁止来自US, CA, UK, AU的流量, 那就就可以按照如下配置:

在nginx 的全局中

map $http_cf_ipcountry $allow {
default yes;
US no;
CA no;
UK no;
AU no;
}

在server中

if ($allow = no) {
return 403;
}

在这里需要注意的是,按照nginx 的官方文档,map 只能位于http block里面,return 就比较自由了,可以在server,location block里面

深入一点,return的可以不仅仅是403,还可以是301, 302等等,比如说把US,CA,UK和 AU 的流量跳转到google,可以这么配置:

if ($allow = no) {
return 301 https://www.google.com;
}

在更加深入一点,活用map功能,我们可以不仅仅是map GEO,还可以map user agent等等,这里就需要正则的配合了.

Centos 7 编译安装nginx

基本说明见这里: https://www.iamhippo.com/2019-12/1167.html

1) update system and install building software

yum clean all
yum update -y

disable selinux

vi /etc/selinux/config
reboot
yum groupinstall -y 'Development Tools'

2) add nginx username and group, identical to the one nginx offical repo creates:

useradd --system --home /var/cache/nginx --shell /sbin/nologin --comment "nginx user" --user-group nginx

 check user and group created

3) download nginx dependencies source code

cd ~
wget https://ftp.pcre.org/pub/pcre/pcre-8.43.tar.gz && tar xzvf pcre-8.43.tar.gz
wget https://www.zlib.net/zlib-1.2.11.tar.gz && tar xzvf zlib-1.2.11.tar.gz
wget http://www.openssl.org/source/openssl-1.1.1d.tar.gz && tar xzvf openssl-1.1.1d.tar.gz
wget https://nginx.org/download/nginx-1.16.1.tar.gz && tar zxvf nginx-1.16.1.tar.gz
cd nginx-1.16.1
./configure --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib64/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-pcre=../pcre-8.43 --with-pcre-jit --with-zlib=../zlib-1.2.11 --with-openssl=../openssl-1.1.1d --with-openssl-opt=no-nextprotoneg --with-debug
make
make install

# go to home

cd ~

# symlink /usr/lib64/nginx/modules to /etc/nginx/modules

ln -s /usr/lib64/nginx/modules /etc/nginx/modules
mkdir /var/cache/nginx -p
mkdir /etc/nginx/vhost -p
vi /usr/lib/systemd/system/nginx.service
[Unit]
Description=nginx - high performance web server
Documentation=http://nginx.org/en/docs/
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t -c /etc/nginx/nginx.conf
ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

reload systemd daemon

systemctl daemon-reload

start service and auto boot

systemctl start nginx
systemctl enable nginx

check if nginx startup on reboot

systemctl is-enabled nginx.service

check if nginx is running

systemctl status nginx

Nginx config

对于在centos 7, debian 9/10 上自己编译的nginx来说,默认的nginx 配置有点弱,于是根据军哥的LNMP的配置,我做了一些修改. 以后凡是自己按照nginx官方repo 的configure编译的nginx,都可以使用如下的nginx.conf,需要在nginx.conf 所在的目录设置一个vhost目录, 所有的individual host 的配置,都放在vhost里面,方便管理. 

user  nginx nginx;

worker_processes auto;
worker_cpu_affinity auto;

error_log  /var/log/nginx/error.log  crit;

pid        /var/run/nginx.pid;

#Specifies the value for maximum file descriptors that can be opened by this process.
worker_rlimit_nofile 51200;

events
    {
        use epoll;
        worker_connections 51200;
        multi_accept off;
        accept_mutex off;
    }

http
    {
        include       mime.types;
        default_type  application/octet-stream;

        server_names_hash_bucket_size 128;
        client_header_buffer_size 32k;
        large_client_header_buffers 4 32k;
        client_max_body_size 50m;
		
        proxy_buffer_size 128k;
        proxy_buffers 4 256k;
        proxy_busy_buffers_size 256k;

        sendfile on;
        sendfile_max_chunk 512k;
        tcp_nopush on;

        keepalive_timeout 60;

        tcp_nodelay on;

        fastcgi_connect_timeout 300;
        fastcgi_send_timeout 300;
        fastcgi_read_timeout 300;
        fastcgi_buffer_size 64k;
        fastcgi_buffers 4 64k;
        fastcgi_busy_buffers_size 128k;
        fastcgi_temp_file_write_size 256k;

        gzip on;
        gzip_min_length  1k;
        gzip_buffers     4 16k;
        gzip_http_version 1.1;
        gzip_comp_level 2;
        gzip_types     text/plain application/javascript application/x-javascript text/javascript text/css application/xml application/xml+rss;
        gzip_vary on;
        gzip_proxied   expired no-cache no-store private auth;
        gzip_disable   "MSIE [1-6]\.";

        #limit_conn_zone $binary_remote_addr zone=perip:10m;
        ##If enable limit_conn_zone,add "limit_conn perip 10;" to server section.

        server_tokens off;
        access_log off;

server
    {
        listen 80 default_server;
        #listen [::]:80 default_server ipv6only=on;
        server_name _;
        index index.html index.htm;
        root  html;

        #error_page   404   /404.html;

        # Deny access to PHP files in specific directory
        #location ~ /(wp-content|uploads|wp-includes|images)/.*\.php$ { deny all; }

        #include enable-php.conf;

        location /nginx_status
        {
            stub_status on;
            access_log   off;
        }

        location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$
        {
            expires      30d;
        }

        location ~ .*\.(js|css)?$
        {
            expires      12h;
        }

        location ~ /.well-known {
            allow all;
        }

        location ~ /\.
        {
            deny all;
        }

#       access_log  /home/wwwlogs/access.log;
    }
include vhost/*.conf;
}

Debian 9/10 编译安装nginx

Debian 9, Debian 10 都适用于此教程.

实际上,Nginx 的官方repo编译的nginx,已经把能加上的module全部都加上了,因此在一般情况下,建议使用nginx的官方repo来安装nginx. 但是如果说你想添加第三方的module,或者使用最新的openssl 的话,在或者更改一下nginx 的安装路径的话,就需要自己编译了. 此篇教程尽量按照nginx官方repo的configure来编译安装openssl.

在一台全新安装的Debian 9或者Debian 10上:

更多

Debian 9 通过nginx repo 安装的nginx的配置解析

鉴于目前很多cloud 和 vps 的服务商还不提供debian 10 的模板,因此这里就先研究debian 9 的

[email protected]:~# nginx -V
nginx version: nginx/1.17.6
built by gcc 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)
built with OpenSSL 1.1.0k 28 May 2019 (running with OpenSSL 1.1.0l 10 Sep 2019)
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fdebug-prefix-map=/data/builder/debuild/nginx-1.17.6/debian/debuild-base/nginx-1.17.6=. -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fPIC' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie'

debian 9 自带openssl

[email protected]:~# openssl version
OpenSSL 1.1.0l 10 Sep 2019

debian 9 自带openssl的版本正是1.1.0l, 但是编译的是用1.1.0k

debian 9 用apt install gcc 以后,得到的版本是:

[email protected]:~# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 6.3.0-18+deb9u1' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)

正是我们用来编译nginx的版本,看来nginx 在debian 9下的编译,都是使用repo 自带的版本,没有使用任何特殊的版本

下面看下user group:

nginx:x:110:113:nginx user,,,:/nonexistent:/bin/false

看看user group history:

[email protected]:/var/log# grep nginx auth.log
Dec 11 09:08:55 vultr groupadd[1614]: group added to /etc/group: name=nginx, GID=113
Dec 11 09:08:55 vultr groupadd[1614]: group added to /etc/gshadow: name=nginx
Dec 11 09:08:55 vultr groupadd[1614]: new group: name=nginx, GID=113
Dec 11 09:08:55 vultr useradd[1620]: new user: name=nginx, UID=110, GID=113, home=/nonexistent, shell=/bin/false
Dec 11 09:08:55 vultr chage[1625]: changed password expiry for nginx
Dec 11 09:08:55 vultr chfn[1628]: changed user 'nginx' information

可以看到nginx user 和 group 被加入了system user 和group,因为从/etc/login.defs 我们可以看到:

# Min/max values for automatic uid selection in useradd
#
UID_MIN 1000
UID_MAX 60000
# System accounts
#SYS_UID_MIN 100
#SYS_UID_MAX 999

#
# Min/max values for automatic gid selection in groupadd
#
GID_MIN 1000
GID_MAX 60000
# System accounts
#SYS_GID_MIN 100
#SYS_GID_MAX 999

看看nginx service 的配置:

[email protected]:~# nano /lib/systemd/system/nginx.service

[Unit]
Description=nginx - high performance web server
Documentation=http://nginx.org/en/docs/
After=network-online.target remote-fs.target nss-lookup.target
Wants=network-online.target

[Service]
Type=forking
PIDFile=/var/run/nginx.pid
ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

这个systemd 的配置和centos 的一模一样

Nginx 前挂Cloudflare 后的日志分析

将nginx服务器隐藏在cloudflare 服务后端,在nginx 的默认access log里面显示的IP 都是cloudflare,因此我们需要把日志中的访问IP改成真正的用户IP. 有很多种办法可以实现,但是下面的这种办法应该是最简单的.

Cloudflare 用X-Forwarded-For这个header 来传递用户的真实IP,因此我们只需要在nginx 的conf中,设置一个新的nginx log format就可以.

默认的nginx access log format 是:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"'; 

我们可以添加一个新的log format:

log_format csf '$http_x_forwarded_for - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

access log 可以设置成类似于这样的:

access_log /your/access/log/path csf;

这样,用户的真实IP就会展现在access log里面了