分类目录归档:技术资源

技术资源,某些有用的开源。

npm的本地缓存npm_lazy方案

具体安装流程参考官网:
1.http://mixu.net/npm_lazy/#installation
2.http://mixu.net/npm_lazy/#configuration

1.安装
npm install -g npm_lazy
2.修改配置,其路径为AppData\Roaming\npm\node_modules\npm_lazy\config.js。
  // external url to npm_lazy, no trailing /
  externalUrl: 'http://localhost:8972',
  // registry url with trailing /
  // remoteUrl: 'https://registry.npmjs.com/',
  remoteUrl: 'https://registry.npm.taobao.org/',
  // bind port and host
  port: 8972,
  host: '0.0.0.0',
3.启动服务
npm_lazy
4.配置npm的镜像路径为。
 // npm config set registry https://registry.npm.taobao.org
 npm config set registry http://localhost:8972
5.npm install

检查是否成功

elasticsearch数据持久化

elasticSearch数据持久化,默认情况不启用数据保存,故数据一般会几分钟就消失,按以下步骤保存索引数据。

#不同的集群名字不能相同。
cluster.name: es_vm_test
node.name: vmmaster
network.host: 0.0.0.0
http.port: 9200
#数据索引保存
path.data: /home/abc/elk-5.5.1/elkdata/data
path.logs: /home/abc/elk-5.5.1/elkdata/log
#关闭登录验证
xpack.security.enabled: false

logstash抓取nginx日志

以下是基于elk+lnmp开源进行测试验证。
也可以参考官网的实现方法:https://kibana.logstash.es/content/logstash/plugins/codec/json.html
https://kibana.logstash.es/content/logstash/plugins/codec/multiline.html
在官网文档中,有较多应用场景:
https://kibana.logstash.es/content/
https://kibana.logstash.es/content/logstash/examples/

1.抓取nginx日志

input {
    file {
        # path => ["/home/wwwlogs/h5.vim.vim.com.log", "/home/wwwlogs/h5.vim.vim.com2.log"]
	path => "/home/wwwlogs/h5.vim.vim.com.log"
        exclude => "*.zip"
        type => "java"
        add_field => [ "domain", "h5.vim.vim.com" ]
        codec => multiline {
                      pattern => "^\s+"
                      what => previous
              }
    }
    file {
        # path => ["/home/wwwlogs/h5.api.vim.vim.com.log", "/home/wwwlogs/h5.api.vim.vim.com2.log"]
	path => "/home/wwwlogs/h5.api.vim.vim.com.log"
        exclude => ["*.zip", "*.gz"]
        type => "java"
        add_field => [ "domain", "h5.api.vim.vim.com" ]
        codec => multiline {
                        pattern => "^\s+"
                        what => previous
                 }
    }
}
filter {

}
output {
    stdout { 
		codec => rubydebug 
	}
    elasticsearch {
        hosts => ["0.0.0.0:9200"]
        index => "logstash-%{domain}-%{+YYYY.MM.dd}"
    }
}

2.定期清理索引

#!/bin/bash

# --------------------------------------------------------------
# This script is to delete ES indices older than specified days.
# Version: 1.0
# --------------------------------------------------------------

function usage() {
        echo "Usage: `basename $0` -s ES_SERVER -d KEEP_DAYS [-w INTERVAL]"
}


PREFIX='logstash-'
WAITTIME=2
NOW=`date  +%s.%3N`
LOGPATH=/apps/logs/elasticsearch


while getopts d:s:w: opt
do
        case $opt in
        s) SERVER="$OPTARG";;
        d) KEEPDAYS="$OPTARG";;
        w) WAITTIME="$OPTARG";;
        *) usage;;
        esac
done

if [ -z "$SERVER" -o -z "$KEEPDAYS" ]; then
        usage
fi

if [ ! -d $LOGPATH ]; then
        mkdir -p $LOGPATH
fi


INDICES=`curl -s $SERVER/_cat/indices?h=index | grep -P '^logstash-.*\d{4}.\d{2}.\d{2}' | sort`
for index in $INDICES
do
        date=`echo $index | awk -F '-' '{print $NF}' | sed 's/\./-/g' | xargs -I{} date -d {} +%s.%3N`
        delta=`echo "($NOW-$date)/86400" | bc`
        if [ $delta -gt $KEEPDAYS ]; then
                echo "deleting $index" | tee -a $LOGPATH/es_delete_indices.log
                curl -s -XDELETE $SERVER/$index | tee -a $LOGPATH/es_delete_indices.log
                echo | tee -a $LOGPATH/es_delete_indices.log
                sleep $WAITTIME
        fi
done

intel推出dpdk网络开发包

intel推出的dpdk网络开发包
http://dpdk.org/

DPDK is a set of libraries and drivers for fast packet processing.
It is designed to run on any processors. The first supported CPU was Intel x86 and it is now extended to IBM POWER and ARM.
It runs mostly in Linux userland. A FreeBSD port is available for a subset of DPDK features.
DPDK is an Open Source BSD licensed project. The most recent patches and enhancements, provided by the community, are available in master branch.
Main libraries

multicore framework
huge page memory
ring buffers
poll-mode drivers for networking , crypto and eventdev

数据库性能压测

mysqltest]$ mysqlslap -S /tmp/mysqltest/mysql.sock -uroot --create=/tmp/mysqltest/user_cart.sql --create-schema=test --query=/tmp/mysqltest/cart1.sql --concurrency=1024 --iterations=3
Benchmark
Average number of seconds to run all queries: 59.774 seconds
Minimum number of seconds to run all queries: 59.570 seconds
Maximum number of seconds to run all queries: 60.040 seconds
Number of clients running queries: 1024
Average number of queries per client: 954
QPS=16343

机器学习的一些库

Gensim是一个相当专业的计算相似度的Python工具包。
在文本处理中,比如商品评论挖掘,有时需要了解每个评论分别和商品的描述之间的相似度,以此衡量评论的客观性。
评论和商品描述的相似度越高,说明评论的用语比较官方,不带太多感情色彩,比较注重描述商品的属性和特性,角度更客观。
http://radimrehurek.com/gensim/

————————————-
图像识别类库
https://github.com/tesseract-ocr/tesseract

原本由惠普开发的图像识别类库tesseract-ocr已经更新到2.04, 就是最近Google支持的那个OCR。原先是惠普写的,现在Open source了。