Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4808b1c
Add looper yml file
mliu014 Mar 16, 2018
58f0e05
Fix the bugs with dirs during merging
mliu014 Mar 21, 2018
e3cd2b7
Update README.md
mliu014 Apr 30, 2018
a08c27b
Upload documentations
mliu014 May 2, 2018
4194cf7
Update README.md
mliu014 May 2, 2018
b01a2d7
Update README.md
mliu014 May 2, 2018
8554f45
Update README.md
mliu014 May 2, 2018
354dc02
This is an auto-generated PR with initial version of insights metadat…
May 3, 2018
6b1887b
Update looper yaml file
mliu014 May 3, 2018
63a0f9d
Add release bits and copy jar path
mliu014 May 3, 2018
72af78f
Bug fix
mliu014 May 3, 2018
08efae1
Merge pull request #3 from LabsBFD/insights-metadata-25
mliu014 May 3, 2018
2501347
Add more documentations
mliu014 May 17, 2018
f397cea
Merge branch 'master' of gecgithub01.walmart.com:LabsBFD/bfd-ceph-swifta
mliu014 May 17, 2018
98f6327
backup parts->staging parts in Readme
mliu014 May 18, 2018
b7d424c
Add or replace with walmart license.
rayzhang123 Jun 18, 2018
642ab65
Update LICENSE
sxia9 Jun 27, 2018
cb03940
Merge pull request #4 from sxia9/patch-1
mliu014 Jun 28, 2018
6f7f6da
Replace copyright from 2011 to 2018.
rayzhang123 Jun 29, 2018
8142171
Merge branch 'master' of https://gecgithub01.walmart.com/LabsBFD/hado…
rayzhang123 Jun 29, 2018
26f94da
Update README.md
mliu014 Jun 30, 2018
292d879
Update LICENSE
mliu014 Jun 30, 2018
9537b8f
Remove the unnecessary dependencies
mliu014 Jul 9, 2018
bc585ab
Update README
mliu014 Jul 9, 2018
23f4a62
Added the NOTICE file
mliu014 Jul 17, 2018
ad77c06
Back to 3.1.0 for released artifacts
mliu014 Jul 17, 2018
b6b62a0
Fix a typo: latge
rayzhang123 Jul 18, 2018
6abd269
Delete .insights.yml
mli014 Nov 11, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions .looper.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
envs:
global:
variables:
component: "" # set inside main flows
packaging: "jar"
proximityBaseURL: "https://repository.walmart.com/content/repositories"
snapshotProximityRepoID: "pangaea_snapshots"
snapshotProximityRepoURL: "${proximityBaseURL}/${snapshotProximityRepoID}"

tools:
jdk: 8
maven: 3.5.0

triggers:
- manual: Run default
- manual:
name: Deploy Snapshot
call: deployAllSnapshots
- manual:
name: Deploy Release
call: deployAllReleases

flows:
default:
- call: versionsCheck
- exposeVars(maven)
- var(version = "${MAVEN_VERSION}")
- call: buildAll

versionsCheck:
- (name Versions init) echo "Versions"
- (name JDK Version) java -version
- (name Maven version) mvn -v

# ${version} must already be set
buildAll:
- shell (name Version file): 'echo "${version}" > VERSION'
- var(component = "hadoop-openstack")
- call: buildComponent

buildComponent:
- shell(name Component intro): echo "Building ${component}"
- shell(name Component jar build): mvn clean package -DskipTests
- shell(name Component jar copy): cp target/${component}-${version}.${packaging} /tmp/${component}-${version}.${packaging}

deployAllSnapshots:
- call: versionsCheck
- exposeVars(maven)
- var(version = "${MAVEN_VERSION}")
- call: buildAll
- var(component = "hadoop-openstack")
- call: deploySnapshot

deploySnapshot:
- var(jarPath = "/tmp/${component}-${version}.${packaging}")
- shell (name Maven init): echo "Deploying ${component} snapshot to Proximity ${version}"
- shell (name Maven deploy-file): mvn -B clean deploy:deploy-file -DartifactId="${component}" -DrepositoryId="${snapshotProximityRepoID}" -Durl="${snapshotProximityRepoURL}" -Dfile="${jarPath}" -Dpackaging="${packaging}" -DpomFile=pom.xml

deployAllReleases:
- call: versionsCheck
- exposeVars(maven)
- var(version = '%{MAVEN_VERSION.replace("-SNAPSHOT", "")}')
- call: buildAll
- call: generateRelease
- var(component = "hadoop-openstack")
- call: deployRelease

generateRelease:
- shell (name Maven release-title): echo "Preparing the release"
- (name Maven release): mvn -B clean release:prepare -DtagNameFormat='@{project.version}'

deployRelease:
- var(jarPath = "/tmp/${component}-${version}.${packaging}")
- shell (name Maven init): echo "Deploying ${component} releases to Proximity ${version}"
- shell (name Maven release-deploy): mvn -B clean deploy:deploy-file -DartifactId="${component}" -DrepositoryId="${releaseProximityRepoID}" -Durl="${releaseProximityRepoURL}" -Dfile="${jarPath}" -Dpackaging="${packaging}" -DupdateReleaseInfo=true -Dversion="${version}" -DpomFile=pom.xml
1,050 changes: 11 additions & 1,039 deletions LICENSE

Large diffs are not rendered by default.

95 changes: 95 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
Compared to sahara-extra's icehouse branch https://github.com/openstack/sahara-extra/tree/icehouse-eol (where the majority of the code is not much different from the latest hadoop-openstack codebase in https://hadoop.apache.org/docs/r3.1.0/hadoop-openstack/index.html). This project has the following improvements of features:

1. Multi-threaded parallel deletes
2. Multi-threaded parallel copies
3. Multi-threaded parallel renames
4. Fixed thread management in existing code and re-designed new custom thread management in general
5. Support large dynamic object partitioning (DLOs) and multi-part uploads to overcome limitations in object size in object storage
6. Added pagination for large number of object listing
7. Re-designed the range seek
8. Added lazy seek to hugely improve read performance
9. Introduced four upload policies: MULTIPART_SPLIT (default), MULTIPART_NO_SPLIT, MULTIPART_SINGLE_THREAD and MULTIPART_SPLIT_BLOCK
10. Added metrics and logging and monitoring for better trouble shooting


Newly Added:
src/main/java/org/apache/hadoop/fs/swifta/exceptions: SwiftMetricWrongParametersException.java
src/main/java/org/apache/hadoop/fs/swifta/http: DaemonThreadFactory.java
src/main/java/org/apache/hadoop/fs/swifta/http: HttpClientManager.java
src/main/java/org/apache/hadoop/fs/swifta/http: IdleConnectionMonitorThread.java
src/main/java/org/apache/hadoop/fs/swifta/http: SwiftClientConfig.java
src/main/java/org/apache/hadoop/fs/swifta/http: SwiftClientConfigFactory.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: InputstreamMetrics.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: MetricsFactory.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: OutputstreamMetrics.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: SwiftMetric.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: SwiftRestClientMetrics.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: SwiftaFileSystemMetrics.java
src/main/java/org/apache/hadoop/fs/swifta/metrics: SwiftaFileSystemStoreMetrics.java
src/main/java/org/apache/hadoop/fs/swifta/model: ListObjectsRequest.java
src/main/java/org/apache/hadoop/fs/swifta/model: ObjectsList.java
src/main/java/org/apache/hadoop/fs/swifta/snative: AsynchronousUpload.java
src/main/java/org/apache/hadoop/fs/swifta/snative: BackupFile.java
src/main/java/org/apache/hadoop/fs/swifta/snative: RangeInputStream.java
src/main/java/org/apache/hadoop/fs/swifta/snative: SwiftNativeOutputStreamMultiPartSingleThread.java
src/main/java/org/apache/hadoop/fs/swifta/snative: SwiftNativeOutputStreamMultipartNoSplit.java
src/main/java/org/apache/hadoop/fs/swifta/snative: SwiftNativeOutputStreamMultipartWithSplit.java
src/main/java/org/apache/hadoop/fs/swifta/snative: SwiftNativeOutputStreamMultipartWithSplitBlock.java
src/main/java/org/apache/hadoop/fs/swifta/snative: SwiftOutputStream.java
src/main/java/org/apache/hadoop/fs/swifta/util: JsonUtil.java
src/main/java/org/apache/hadoop/fs/swifta/util: PriorityThreadFactory.java
src/main/java/org/apache/hadoop/fs/swifta/util: ThreadManager.java
src/main/java/org/apache/hadoop/fs/swifta/util: ThreadUtils.java

Changes:
src/main/java/org/apache/hadoop/fs/swifta/auth/ApiKeyAuthenticationRequest.java
src/main/java/org/apache/hadoop/fs/swifta/auth/ApiKeyCredentials.java
src/main/java/org/apache/hadoop/fs/swifta/auth/AuthenticationRequest.java
src/main/java/org/apache/hadoop/fs/swifta/auth/AuthenticationRequestWrapper.java
src/main/java/org/apache/hadoop/fs/swifta/auth/AuthenticationResponse.java
src/main/java/org/apache/hadoop/fs/swifta/auth/AuthenticationWrapper.java
src/main/java/org/apache/hadoop/fs/swifta/auth/KeyStoneAuthRequest.java
src/main/java/org/apache/hadoop/fs/swifta/auth/KeystoneApiKeyCredentials.java
src/main/java/org/apache/hadoop/fs/swifta/auth/PasswordAuthenticationRequest.java
src/main/java/org/apache/hadoop/fs/swifta/auth/PasswordCredentials.java
src/main/java/org/apache/hadoop/fs/swifta/auth/Roles.java
src/main/java/org/apache/hadoop/fs/swifta/auth/entities/AccessToken.java
src/main/java/org/apache/hadoop/fs/swifta/auth/entities/Catalog.java
src/main/java/org/apache/hadoop/fs/swifta/auth/entities/Endpoint.java
src/main/java/org/apache/hadoop/fs/swifta/auth/entities/Tenant.java
src/main/java/org/apache/hadoop/fs/swifta/auth/entities/User.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftAuthenticationFailedException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftBadRequestException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftConfigurationException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftConnectionClosedException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftConnectionException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftInternalStateException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftInvalidResponseException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftJsonMarshallingException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftNotDirectoryException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftOperationFailedException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftPathExistsException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftThrottledRequestException.java
src/main/java/org/apache/hadoop/fs/swifta/exceptions/SwiftUnsupportedFeatureException.java
src/main/java/org/apache/hadoop/fs/swifta/http/CopyMethod.java
src/main/java/org/apache/hadoop/fs/swifta/http/ExceptionDiags.java
src/main/java/org/apache/hadoop/fs/swifta/http/HttpBodyContent.java
src/main/java/org/apache/hadoop/fs/swifta/http/HttpInputStreamWithRelease.java
src/main/java/org/apache/hadoop/fs/swifta/http/RestClientBindings.java
src/main/java/org/apache/hadoop/fs/swifta/http/SwiftProtocolConstants.java
src/main/java/org/apache/hadoop/fs/swifta/http/SwiftRestClient.java
src/main/java/org/apache/hadoop/fs/swifta/snative/SwiftFileStatus.java
src/main/java/org/apache/hadoop/fs/swifta/snative/SwiftNativeFileSystem.java
src/main/java/org/apache/hadoop/fs/swifta/snative/SwiftNativeFileSystemStore.java
src/main/java/org/apache/hadoop/fs/swifta/snative/SwiftNativeInputStream.java
src/main/java/org/apache/hadoop/fs/swifta/snative/SwiftObjectFileStatus.java
src/main/java/org/apache/hadoop/fs/swifta/util/Duration.java
src/main/java/org/apache/hadoop/fs/swifta/util/DurationStats.java
src/main/java/org/apache/hadoop/fs/swifta/util/DurationStatsTable.java
src/main/java/org/apache/hadoop/fs/swifta/util/SwiftObjectPath.java
src/main/java/org/apache/hadoop/fs/swifta/util/SwiftTestUtils.java
src/main/java/org/apache/hadoop/fs/swifta/util/SwiftUtils.java

Unchanged:
src/main/java/org/apache/hadoop/fs/swifta/snative/StrictBufferedFSInputStream.java
57 changes: 48 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,53 @@
# bfd-ceph-swifta
A Hadoop Swift-API compatible file system driver, based on sahara-extra, that is tested against OpenStack Ceph.
## Usage:
1) Add the value of "fs.swifta.impl" in core-site.xml to "org.apache.hadoop.fs.swifta.snative.SwiftNativeFileSystem".
# hadoop-openstack-swifta

2) You may want to do the same for Hive, Spark or Presto if any core-site.xml presents.
This module enables Apache Hadoop applications including MapReduce jobs, read and write data to and from instances of the OpenStack Swift object store. It significantly rewrites the existing hadoop-openstack swift driver over the icehouse release of openstack sahara-extra project: https://github.com/openstack/sahara-extra/tree/icehouse-eol. It can be embedded into the the hadoop-openstack submodule of the hadoop codebase: https://github.com/apache/hadoop/tree/trunk/hadoop-tools/hadoop-openstack, in a way very similar to the huge efforts of the hadoop-aws s3a over s3n enhancements: https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html. This codebase has been tested extensively against Swift-API compatible Ceph Jewel 10.2.7 object storage.

3) Copy hadoop-openstack-*.jar to $HADOOP_HOME/share/hadoop/tools/lib/ and link the same jar to $HADOOP_HOME/share/hadoop/hdfs/lib/
## How to build and test

4) You are ready to go, make sure to use the same swifta:// protocol.
The hadoop-openstack-swifta codebase can be remotely tested against any public or private cloud infrastructure which supports the OpenStack Keystone authentication mechanism. It can also be tested against private OpenStack clusters. OpenStack Development teams are strongly encouraged to test the Hadoop swift filesystem client against any version of Swift that they are developing or deploying, to stress their cluster and to identify bugs early.

The module comes with a large suite of JUnit tests -tests that are only executed if the source tree includes credentials to test against a specific cluster.

Create the file:

src/test/resources/auth-keys.xml

Into this file, insert the credentials needed to bond to the test filesystem, as decribed above.

Next set the property test.fs.swifta.name to the URL of a swift container to test against. The tests expect exclusive access to this container do not keep any other data on it, or expect it to be preserved.

<property>
<name>test.fs.swifta.name</name>
<value>swifta://test-container.test-region/</value>
</property>

Build swifta package:

mvn clean package -DskipTests

This builds a set of Hadoop JARs consistent with the hadoop-openstack module that is about to be tested.

mvn test -Dtest=TestSwiftRestClient

This runs some simple tests which include authenticating against the remote swift service. If these tests fail, so will all the rest. If it does fail: check your authentication.

Once this test succeeds, you can run the full test suite:

mvn test


## How to configurae a hadoop cluster with swifta:

1) Build swifta: mvn clean package -DskipTests

2) Add the following snippet to core-site.xml:

<property>
<name>fs.swifta.impl</name>
<value>org.apache.hadoop.fs.swifta.snative.SwiftNativeFileSystem</value>
</property>

3) Copy hadoop-openstack-*.jar to $HADOOP_HOME/share/hadoop/tools/lib/ and link the same jar to $HADOOP_HOME/share/hadoop/common/lib/

4) You are ready to go, make sure to use the swifta:// protocol, e.g.: hadoop fs -ls swifta://test-container.test-region/.

This branch adds a content-type to all dummy folders. For all files, even 0-byte one stays empty for the content_type.
#### This project only supports for Java 7 and above.
37 changes: 4 additions & 33 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
<parent>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-project</artifactId>
<version>3.2.0-SNAPSHOT</version>
<version>3.1.0</version>
<relativePath>../../hadoop-project</relativePath>
</parent>
<artifactId>hadoop-openstack</artifactId>
<version>3.2.0-SNAPSHOT</version>
<version>3.1.0</version>
<name>Apache Hadoop OpenStack support</name>
<description>
This module contains code to support integration with OpenStack.
Expand All @@ -46,7 +46,7 @@
</file>
</activation>
<properties>
<maven.test.skip>false</maven.test.skip>
<maven.test.skip>true</maven.test.skip>
</properties>
</profile>
<profile>
Expand Down Expand Up @@ -100,48 +100,19 @@
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<scope>compile</scope>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<scope>test</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-annotations</artifactId>
<scope>compile</scope>
</dependency>

<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<scope>compile</scope>
</dependency>

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more contributor license
* agreements. See the NOTICE file distributed with this work for additional information regarding
* copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at
* Copyright (c) [2018]-present, Walmart Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


package org.apache.hadoop.fs.swifta.auth;

import java.util.Objects;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more contributor license
* agreements. See the NOTICE file distributed with this work for additional information regarding
* copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at
* Copyright (c) [2018]-present, Walmart Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


package org.apache.hadoop.fs.swifta.auth;

import java.util.Objects;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more contributor license
* agreements. See the NOTICE file distributed with this work for additional information regarding
* copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance with the License. You may obtain a
* copy of the License at
* Copyright (c) [2018]-present, Walmart Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software distributed under the License
* is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
* or implied. See the License for the specific language governing permissions and limitations under
* the License.
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


package org.apache.hadoop.fs.swifta.auth;

import java.util.Objects;
Expand Down
Loading