分类目录归档:docker

ZooKeeper的容器化配置

docker pull zookeeper
https://github.com/getwingm/kafka-stack-docker-compose

version: '3.1'
 
services:
  zoo1:
    image: zookeeper
    restart: always
    hostname: zoo1
    ports:
      - 2181:2181
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
 
  zoo2:
    image: zookeeper
    restart: always
    hostname: zoo2
    ports:
      - 2182:2181
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181
 
  zoo3:
    image: zookeeper
    restart: always
    hostname: zoo3
    ports:
      - 2183:2181
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181

sparksql的操作实践

KMR
1.登录KMR
2.切换致spark帐号【su – spark】
3.进入spark-shell的命令行操作界面

spark-shell --master=yarn

4.常见命令如下

spark.sql("create external table bhabc(`userid` bigint,`id` int,`date` string,`count` bigint,`opcnt` int,`start` int,`end` int) partitioned by (dt string) row format delimited fields terminated by ','  stored as sequencefile location '/data/behavior/bh_abc_dev'").show
spark.sql("show tables").show
spark.sql("show databases").show
spark.sql("show tables").show
spark.sql("show partitions bhwps").show
spark.sql("alter table bhwps add partition(dt='2019-05-21')").show
spark.sql("select * from bhwps where dt between '2019-05-15' and '2019-05-31' order by `count` desc").show
spark.sql("alter table bhwps add partition(dt='2019-06-22') partition(dt='2019-06-23')").show增加多个分区
spark.sql("msck repair table bhwps").show 修复分区就是重新同步hdfs上的分区信息。
spark.sql("show partitions bhraw").show(100,false) 可以显示超过20个记录。

5.常见问题:
》目录权限问题
可以用hdfs dfs -chown -r /path来修改目录权限。

清理垃圾桶
hdfs dfs -expunge

阿里云Docker私人专属镜像加速

vim /etc/docker/daemon.json

https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors

{
“bip”:”192.168.55.1/24″,
“registry-mirrors”: [“https://2na48vbddcw.mirror.aliyuncs.com”]
}

把我常用的字母移除到只有8个字母。
sudo systemctl daemon-reload
sudo systemctl restart docker

HIVE的sequenceFile的操作常用命令

sequencefile是一组Key和Value的健值对。在实际中HIVE创建表时,key是没有无意义的。它只根据value的格式进行切换数据。
0.登录容器并连接上hive

docker-compose -f docker-compose-hive.yml exec hive-server  bash
/opt/hive/bin/beeline -u jdbc:hive2://localhost:10000

1.建表

 
create external table sfgz(
     `idx` string,
     `userid` string,
     `flag` string,
     `count` string,
     `value` string,
     `memo` string)
  partitioned by (dt string)
  row format delimited fields terminated by ','
  stored as sequencefile
  location '/user/sfgz';

2.分区加载
方法一:
hadoop fs -mkdir -p /user/sfgz/dt=2010-05-06/
hadoop fs -put /tools/mytest.txt.sf /user/sfgz/dt=2019-05-17
hadoop fs -put /tools/mytest.txt.sf /user/sfgz/dt=2010-05-04
这样是无法直接被hive所识别的,必须用alter table partition的命令把相应的分区表加入至数据库中,才能正常访问。
方法二,加载完就可以直接查询的:
load data local inpath ‘/tools/mytest.txt.sf’ into table sfgz partition(dt=’2009-03-01′);这种方法是可以直接查询了。
load data local inpath ‘/tools/mytest.gzip.sf’ into table sfgz partition(dt=’2000-03-02′);
3. 检查分区信息:
show partitions sfgz;
4. 添加分区
alter table sfgz add partition(dt=’2000-03-03′);
5. 插入一条记录:

   insert into sfgz partition(dt='2019-05-16')values('idx3','uid6','5','6','34.7','uid3test2');

6. 统计指令:
select count(*) from sfgz; 在KMR中不支持这种方式。
select count(idx) from sfgz; 在KMR中只支持这种方式。
6. 其它常见命令
show databases;
use database;
whow tables;
select * from sfgz where dt=’2000-03-03′;
msck repair table sfgz; 分区修复指令:

docker-hive的操作验试

1.下载docker镜像库:https://github.com/big-data-europe/docker-hive.git,并安装它。
2.修改其docker-compose.yml文件,为每个容器增加上映射。

version: "3"
 
services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
    volumes:
      - /data/namenode:/hadoop/dfs/name
      - /data/tools:/tools
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop-hive.env
    ports:
      - "50070:50070"
  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
    volumes:
      - /data/datanode:/hadoop/dfs/data
      - /data/tools:/tools
    env_file:
      - ./hadoop-hive.env
    environment:
      SERVICE_PRECONDITION: "namenode:50070"
    ports:
      - "50075:50075"
  hive-server:
    image: bde2020/hive:2.3.2-postgresql-metastore
    volumes:
      - /data/tools:/tools
    env_file:
      - ./hadoop-hive.env
    environment:
      HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore"
      SERVICE_PRECONDITION: "hive-metastore:9083"
    ports:
      - "10000:10000"
  hive-metastore:
    image: bde2020/hive:2.3.2-postgresql-metastore
    volumes:
      - /data/tools:/tools
    env_file:
      - ./hadoop-hive.env
    command: /opt/hive/bin/hive --service metastore
    environment:
      SERVICE_PRECONDITION: "namenode:50070 datanode:50075 hive-metastore-postgresql:5432"
    ports:
      - "9083:9083"
  hive-metastore-postgresql:
    image: bde2020/hive-metastore-postgresql:2.3.0
    volumes:
      - /data/tools:/tools
 
  presto-coordinator:
    image: shawnzhu/prestodb:0.181
    volumes:
      - /data/tools:/tools
    ports:
      - "8080:8080"

2.创建测试文本

1,xiaoming,book-TV-code,beijing:chaoyang-shagnhai:pudong
2,lilei,book-code,nanjing:jiangning-taiwan:taibei
3,lihua,music-book,heilongjiang:haerbin
3,lihua,music-book,heilongjiang2:haerbin2
3,lihua,music-book,heilongjiang3:haerbin3

3.启动并连接HIVE服务。

docker-compose up -d
docker-compose exec hive-server bash
/opt/hive/bin/beeline -u jdbc:hive2://localhost:10000


4.创建外部表

create external table t2(
    id      int
   ,name    string
   ,hobby   array<string>
   ,add     map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/t2'


5.文件上传到上步骤中的目录内。
方法1:在HIVE的beeline终端中采用:
load data local inpath ‘/tools/example.txt’ overwrite into table t2; 删除已经存在的所有文件,然后写入新的文件。
load data local inpath ‘/tools/example.txt’ into table t2; 在目录中加入新的文件【差异在overwrite】。
方法2:用hadoop fs -put的文件上传功能。
hadoop fs -put /tools/example.txt /user/t2 文件名不改变。
hadoop fs -put /tools/example.txt /user/t2/1.txt 文件名为1.txt
6.在HIVE命令行中验证

select * from t2;  上传一次文件,执行一次。


7.在hadoop的文件管理器,也可以浏览到新上传的文件。

同一个文件中的记录是会自动作去重处理的。

——————————————-
如果是sequencefile呢?
1.检验sequencefile的内容。
hadoop fs -Dfs.default.name=file:/// -text /tools/mytest.gzip.sf 废弃的
hadoop fs -Dfs.defaultFS=file:/// -text /tools/mytest.txt.sf

实际内容是:

2.建表

  create external table sfgz(
     `idx` string,
     `userid` string,
     `flag` string,
     `count` string,
     `value` string,
     `memo` string)
  partitioned by (dt string)
  row format delimited fields terminated by ','
  stored as sequencefile
  location '/user/sfgz';

3.上传文件

方法一:
hadoop fs -mkdir -p /user/sfgz/dt=2010-05-06/
hadoop fs -put /tools/mytest.txt.sf /user/sfgz/dt=2019-05-17
hadoop fs -put /tools/mytest.txt.sf /user/sfgz/dt=2010-05-04
这种方法,还需要人为Reload一下才行,其reload指令是:
方法二:
load data local inpath '/tools/mytest.txt.sf' into table sfgz partition(dt='2009-03-01');这种方法是可以直接查询了。
load data local inpath '/tools/mytest.gzip.sf' into table sfgz partition(dt='2000-03-02');

spark/hive的镜像Github

Big Data Europe
目前最靠谱的样板
https://github.com/big-data-europe/docker-spark
https://github.com/big-data-europe/docker-hive
https://github.com/big-data-europe

HIVE文档
https://cwiki.apache.org/confluence/display/Hive/Home#Home-UserDocumentation

WIKI的docker部署

1.Dockerfiles编写

FROM centos:6.6
 
ENV CONF_INST  /opt/atlassian/
ENV CONF_HOME  /var/atlassian/application-data/
 
 
COPY ./confluence-5.4.4.tar.gz /confluence-5.4.4.tar.gz
COPY ./application-data-init.tar.gz /application-data-init.tar.gz
RUN set -x && yum install -y tar && mkdir -p ${CONF_INST} && tar -xvf /confluence-5.4.4.tar.gz --directory "${CONF_INST}/"
 
COPY ./startup.sh /startup.sh
RUN chmod +x /startup.sh
 
EXPOSE 8090
VOLUME ["${CONF_HOME}", "${CONF_INST}"]
CMD ["/startup.sh"]

2.docker-compose.yml的编写

version: '3.1'
 
services:
  confluence:
    image: wiki:1.0
    restart: always
    ports:
      - 8090:8090
    #entrypoint: bash -c "ping 127.0.0.1"
    #command: bash -c "ping 127.0.0.1"
    #command: /opt/atlassian/confluence/bin/catalina.sh run
    volumes:
      - /data/atlassian/confluence/logs:/opt/atlassian/confluence/logs
      - /data/atlassian/confluence/logs:/opt/atlassian/application-data/confluence/logs
      - /data/atlassian/application-data:/var/atlassian/application-data
      - ./backups:/var/atlassian/application-data/confluence/backups
      - ./restore:/var/atlassian/application-data/confluence/restore:ro
      - /etc/localtime:/etc/localtime:ro
      - /etc/timezone:/etc/timezone:ro
    build:
      context: ./crack
      dockerfile: Dockerfile

ss翻墙配置

services:
  ssserver:
    image: mritd/shadowsocks:3.2.0
    restart: always
    ports:
      - 8973:6443 
      - 8971:6500
    environment:
      SS_CONFIG: "-s 0.0.0.0 -p 6443 -m chacha20 -k My.123 --fast-open"
      KCP_FLAG: "false"
      KCP_MODULE: "kcpserver"
      KCP_CONFIG: "-t 127.0.0.1:6443 -l :6500 -mode fast2"

shadowsocks客户端连接8973端口,即可。

zipkin的docker配置

参照https://github.com/openzipkin/docker-zipkin的配置

version: '3.1'
 
services:
  storage:
    image: openzipkin/zipkin-mysql:2.11.7
    container_name: zipkin-mysql
    # Uncomment to expose the storage port for testing
    # ports:
    #   - 3306:3306
 
  zipkin:
    image: openzipkin/zipkin:2.11.7
    restart: always
    container_name: zipkin
    ports:
      - 9411:9411
    environment:
      - STORAGE_TYPE=mysql
      - MYSQL_HOST=zipkin-mysql
      # Uncomment to enable scribe
      - SCRIBE_ENABLED=true
      # Uncomment to enable self-tracing
      - SELF_TRACING_ENABLED=true
      # Uncomment to enable debug logging
      - JAVA_OPTS=-Dlogging.level.zipkin=DEBUG -Dlogging.level.zipkin2=DEBUG
    depends_on:
      - storage

MySQL的主从配置

https://github.com/getwingm/mysql-replica

version: '2'
services:
    master:
        image: twang2218/mysql:5.7-replica
        restart: unless-stopped
        ports:
            - 3306:3306
        environment:
            - MYSQL_ROOT_PASSWORD=master_passw0rd
            - MYSQL_REPLICA_USER=replica
            - MYSQL_REPLICA_PASS=replica_Passw0rd
        command: ["mysqld", "--log-bin=mysql-bin", "--server-id=1"]
    slave:
        image: twang2218/mysql:5.7-replica
        restart: unless-stopped
        ports:
            - 3307:3306
        environment:
            - MYSQL_ROOT_PASSWORD=slave_passw0rd
            - MYSQL_REPLICA_USER=replica
            - MYSQL_REPLICA_PASS=replica_Passw0rd
            - MYSQL_MASTER_SERVER=master
            - MYSQL_MASTER_WAIT_TIME=10
        command: ["mysqld", "--log-bin=mysql-bin", "--server-id=2"]