Streamsets 很好用,功能齐全,但是不开源了。Cloudera也是,很忧伤啊。
在接触Streamsets的时候,已经需要注册下载了,但是呢,注册不上。官方没有扼杀所有,可以自己编译。下载地址:
https://codeload.github.com/designmind/datacollector-plugin-api/zip/refs/heads/master https://codeload.github.com/designmind/datacollector/zip/refs/heads/master https://codeload.github.com/designmind/datacollector-api/zip/refs/heads/master
编译着实要急死人,很多地址都失效了。
服务器上需要的环境:
java maven nodes=js jbower npm 自己弄哈。
1、解压下载好的三个zip。
2、编译datacollector-api-master
cd datacollector-api-master mvn install -DskipTests
3、编译datacollector-plugin-api-master
cd datacollector-plugin-api-master mvn install -DskipTests
4、编译datacollector-master
这里面现在很多网址用不了,而且很多包也下架了。要更新pom.xml文件
4.0.0 com.streamsets streamsets-datacollector3.23.0-SNAPSHOT StreamSets Data Collector StreamSets Data Collector pom http://www.streamsets.com https://github.com/streamsets/datacollector StreamSets http://www.streamsets.com Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.txt brock Brock Noland brock@streamsets.com America/Chicago 0.12 aerospike-lib aws-lib aws-secrets-manager-credentialstore-lib azure-keyvault-credentialstore-libbasic-lib bigtable-lib crypto-lib cyberark-credentialstore-lib dev-lib dataformats-lib google-cloud-lib influxdb_0_9-lib jks-credentialstore-lib jdbc-lib jms-lib kinesis-lib mleap-lib mysql-binlog-lib omniture-lib orchestrator-lib rabbitmq-lib redis-lib salesforce-lib sap_hana-lib stats-lib tensorflow-lib vault-credentialstore-lib wholefile-transformer-lib windows-lib true 3.23.0-SNAPSHOT 3.23.0-SNAPSHOT thycotic-credentialstore-lib com.streamsets streamsets-datacollector-api${datacollector-api.version} com.streamsets streamsets-datacollector-spark-api${datacollector-spark-api.version} javax.servlet javax.servlet-api3.1.0 rbgen-maven-plugin root-proto root testing bootstrap utils sso aster-client common upgrader container-common metadata-generator google-common google-connection json-dto messaging-client container miniSDC sdk stage-lib-archetype hadoop-common mapr-common jks-common aws-support aws-s3-connection aws-kinesis-connection jdbc-connection aws-sqs-connection salesforce-connection kafka-connection elasticsearch-connection aws-shared cluster-connections/emr-cluster-connection root-lib stagesupport guavasupport commonlib httpcommonlib net-commonlib aws-secrets-manager-credentialstore-protolib azure-keyvault-credentialstore-protolib cyberark-credentialstore-protolib lookup-protolib hdfs-protolib mapreduce-protolib maprfs-protolib maprdb-protolib mapr_json-protolib mapr_json-5_2-protolib mapr_json-6_0-protolib hive-protolib jks-credentialstore-protolib dir-spooler-protolib sdc-kafka-api sdc-kafka_0_8 sdc-kafka_0_9-common sdc-kafka_0_9 sdc-kafka_0_9_mapr_5_1 sdc-kafka_0_9_mapr_5_2 sdc-kafka_0_10 sdc-kafka_0_11-common sdc-kafka_0_11 sdc-kafka_1_0 sdc-kafka_2_0 sdc-kafka_0_11_mapr_6_1 kafka-common kafka_source-protolib kafka_multisource-protolib kafka_multisource-0_9-protolib kafka_multisource-0_10-protolib kafka_target-protolib maprstreams-common maprstreams-target-protolib maprstreams-source-protolib maprstreams-multisource-protolib jython-protolib groovy-protolib kinetica-protolib kinetica-6_2-protolib couchbase-protolib snowflake-connection elasticsearch-protolib solr-protolib cassandra-protolib mongodb-protolib flume-protolib cluster-hdfs-protolib sdc-hbase-0_98 sdc-hbase-2_0 sdc-hbase-api hbase-protolib kudu-protolib cluster-common cluster-kafka-protolib cluster-bootstrap-api cluster-bootstrap mapr-cluster-bootstrap mapr-cluster-bootstrap_2_2 mesos-bootstrap client-api cli sdc-solr-api sdc-solr_cdh_4 sdc-solr_6 sdc-solr_7 sdc-solr_8 spark-executor-protolib spark-processor-protolib scripting-protolib wholefile-converter-protolib emr-protolib aerospike-lib aws-lib aws-secrets-manager-credentialstore-lib azure-lib azure-keyvault-credentialstore-lib basic-lib file-transfer-connection bigtable-lib crypto-lib cyberark-credentialstore-lib dev-lib dataformats-lib google-cloud-lib influxdb_0_9-lib jks-credentialstore-lib jdbc-lib jdbc-protolib jms-lib kinesis-lib mleap-lib mysql-binlog-lib omniture-lib orchestrator-lib rabbitmq-lib redis-lib salesforce-lib sap_hana-lib stats-lib tensorflow-lib vault-credentialstore-lib wholefile-transformer-lib windows-lib pulsar-protolib thycotic-credentialstore-lib google-cloud-support apache-kudu-connection azure-connection jms-connection org.apache.maven.plugins maven-deploy-plugin2.8.2 false org.apache.rat apache-rat-plugin${rat-plugin.version} false CHANGES.txt **/.settings.classpath **/.project **/.idea*.iml **/.gitignore .gitreview .gitbuildInfo*.properties **/targetnode_modules.bowerrc salesforce-connection/src/main/java/com/streamsets/pipeline/lib/salesforce/connection/mutualauth/ClientSSLTransport.java **/META-INF/services*.conf **MANIFEST.MF **/service.sdl ***.db ***.txt ***.log ***.desc ***.md ***.xlsx **id_rsa_test **/id_rsa_test_unencrypted **/*.pub common/src/test/resources/TestStreamingXmlParser-records.xml python/** docs/** cloudera-integration/csd/** datacollector-ui/src/main/webapp/common/directives/** databricks-ml-protolib/src/test/resources/** jython-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/jython/default_init_script.py jython-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/jython/default_script.py jython-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/jython/default_destroy_script.py basic-lib/src/main/resources/com/streamsets/pipeline/stage/processor/javascript/default_init_script.js basic-lib/src/main/resources/com/streamsets/pipeline/stage/processor/javascript/default_script.js basic-lib/src/main/resources/com/streamsets/pipeline/stage/processor/javascript/default_destroy_script.js groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/groovy/default_init_script.groovy groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/groovy/default_script.groovy groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/processor/groovy/default_destroy_script.groovy groovy-protolib/src/main/resources/com/streamsets/pipeline/stage/origin/groovy/GeneratorOriginScript.groovy basic-lib/src/main/resources/com/streamsets/pipeline/stage/origin/javascript/GeneratorOriginScript.js jython-protolib/src/main/resources/com/streamsets/pipeline/stage/origin/jython/GeneratorOriginScript.py run org.owasp dependency-check-maven3.1.2 false true falsetrue true ALL ${basedir}/dependency-check-suppression.xml aggregate false aggregate org.apache.maven.plugins maven-project-info-reports-plugin2.8 false false dependencies rat-check true!skipRat false org.codehaus.mojo exec-maven-plugin1.6.0 rat-check generate-sources exec mvn ${basedir} apache-rat:check -Nall-libs falserelease apache-kafka_2_1-lib apache-kudu_1_7-lib apache-pulsar_2-lib apache-solr_6_1_0-lib cassandra_3-lib cdh-spark_2_3-lib cdh_6_2-lib cdh_kafka_2_1-lib couchbase_5-lib elasticsearch_7-lib groovy_2_4-lib jython_2_7-lib kinetica_7_0-lib mapr_6_1-lib mongodb_4-lib apache-kafka_2_1-lib apache-kudu_1_7-lib apache-pulsar_2-lib apache-solr_6_1_0-lib cassandra_3-lib cdh_6_2-lib cdh_kafka_2_1-lib cdh-spark_2_3-lib couchbase_5-lib elasticsearch_7-lib groovy_2_4-lib kinetica_7_0-lib jython_2_7-lib mapr_6_1-lib mongodb_4-lib sample-dev-libs !protolibs-only apache-kafka_2_7-lib apache-pulsar_2-lib apache-solr_6_1_0-lib cassandra_3-lib couchbase_5-lib elasticsearch_7-lib groovy_2_4-lib jython_2_7-lib kinetica_7_0-lib mongodb_4-lib azure_libapache-kafka_2_7-lib apache-pulsar_2-lib apache-solr_6_1_0-lib cassandra_3-lib couchbase_5-lib elasticsearch_7-lib groovy_2_4-lib kinetica_7_0-lib jython_2_7-lib mongodb_4-lib archetype truestage-lib-archetype sign org.apache.maven.plugins maven-gpg-pluginverify sign stage-lib-parent falsestage-lib-parent all-poms falsehdp-stagelib-base cdh_6-stagelib-base ui falserelease datacollector-ui docs falserelease docs dist falserelease dist cloudera-integration release falserelease release rpm falserelease rpm java-src falseorg.apache.maven.plugins maven-assembly-pluginfalse false false gnu streamsets-datacollector-java-src-${project.version} release/src/main/assemblies/java-src.xml generate-sources falseorg.apache.maven.plugins maven-source-pluginattach-sources jar cdh.plugin.repo https://repository.cloudera.com/artifactory/cloudera-repos Cloudera Repositories false cdh.repo https://repository.cloudera.com/artifactory/cloudera-repos Cloudera Repositories false confluent http://packages.confluent.io/maven/ elasticsearch-releases https://artifacts.elastic.co/maven true false mapr-releases http://repository.mapr.com/maven/ false true true always warn false never fail HDPReleases HDP Releases http://repo.hortonworks.com/content/repositories/releases/ default true always warn false never fail HDPRehosted HDP Releases http://repo.hortonworks.com/content/repositories/releases/ default true always warn false never fail HDPJetty HDP Jetty http://repo.hortonworks.com/content/repositories/jetty-hadoop/ default snapshots-repo https://oss.sonatype.org/content/repositories/snapshots false true kinetica-releases http://files.kinetica.com/nexus/content/repositories/releases/ false bintray-databricks-maven bintray https://maven.aliyun.com/repository/central spring https://maven.aliyun.com/repository/spring central https://maven.aliyun.com/repository/central mapr-public https://maven.aliyun.com/repository/mapr-public
进入datacollector-master/datacollector-ui 修改pom.xml,穷,买不起代理,下载git上的资源经常性的GG。
install --offline
手动安装bower.json里面的js项目,这里不写了,网速不友好的最好一个一个下。下载好了再编译,不然在编译真个项目的时候过不了。
回到datacollector-master
发布模式编译:
mvn -T 8 clean package -Drelease -DskipTests -P-rp
5、等吧,等吧。编译成功后,包都在datacollector-master/release/target 下面。



