Why Nexus and not Artifactory? Compliance, Standards, Security, and Quality
We (Sonatype) recently received some support requests from a company making a switch from Artifactory to Nexus. In the evaluation and system design phase, they were setting up Nexus to proxy their internal Artifactory instance and where having some troubles with integration. Our support staff did some digging and the results where unexpected.
Before
I get into the details, I just want to say that I don't derive much
satisfaction from pointing out problems in Artifactory, and I won't
claim Nexus is perfect either, but we pay very detailed attention to
key areas like stability, performance and most importantly,
interoperability. Frankly, it isn't something I'd like to be spending
my time on, but I've read so much hyperbole from JFrog about how
configuring mirrorOr is "lazy and dirty", and so much trash talk about
Sonatype just being "all talk" that I think it is time to start
answering the criticism.
POM Rewriting and License Compliance
The customer was configuring their system to use the Procurement support in Nexus and it was choking on validating the signature of a lot of artifacts coming from their legacy Artifactory system. Upon investigation, we found that Artifactory completely rewrites the pom files, presumably as part of a new feature to strip out repository entries from the poms. To see for yourself, compare the results of these two urls:http://repo1.maven.org/maven2/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom
and
http://repo.jfrog.org/artifactory/libs-releases/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom
Notice first that this pom has no repository element in it, therefore there is no need to modify the file at all. A closer evaluation will reveal that this pom being “proxied” by Artifactory is completely rewritten, removing all comments and reordering elements. I personally don’t think it’s a good idea to muck around with files being proxied but it’s probably fine assuming all the parsing is done correctly. It does introduce yet another place for things to go wrong though. I mean comments aren’t really that important are they? Well, if you care about open source licensing, they are. Take a look at this POM from Central:
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>maven</artifactId>
<groupId>org.apache.maven</groupId>
<version>2.0.10</version>
</parent>
Now take a look at first few lines from the same POM from JFrog's public repository:
<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>
<parent>
...
The License header of the file has been completely stripped away. I was pretty sure that this might be a violation of the
license itself, so I checked the Apache License at http://www.apache.org/licenses/LICENSE-2.0.
4.2 You must cause any modified files to carry prominent notices stating that You changed the files; and
4.3 You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works
I am not a lawyer, but I interpret this to mean that if you have this option turned on, and you are distributing these POMs to anyone else, you may be violating the license of artifacts being proxied. This example is for the ASL license, a rather liberal license as they go, but as an active participant in the ASF, I can tell you that the organization takes licensing issues very seriously. The fact is headers from any pom would be dumped and most licenses out there probably frown upon this. Some of you are going to shrug this off as a minor problem, maybe it is, but this is the sort of minor issue that will make a legal compliance department go berserk. But, striping licenses off of POMs wasn't really the main issue, it was just something I stumbled on trying to find a solution to the problem with PGP signatures.
POM Rewriting and PGP Signatures
Setting aside the license issue for a moment, let’s go back to the
procurement issue that was reported. Now try getting the signature file
for this artifact so you can validate it hasn’t been tampered with.
The asc file should have a GPG signature that was created with a
publicly accessible key. Click on the following URL on central to see
an example.
http://repo1.maven.org/maven2/org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc
Ok
so far? (That my signature fwiw) Here’s the crux of the issue.
Click on the same artifact in the Atifactory proxy of Central below:
http://repo.jfrog.org/artifactory/libs-releases/org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc
At the time of writing, I get:
HTTP Status 500 -type Exception report
message
description The server encountered an internal error () that prevented it from fulfilling this request.
exception
java.lang.IllegalArgumentException: Checksum type not found for path org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc org.artifactory.engine.DownloadServiceImpl.respondForChecksumRequest(DownloadServiceImpl.java:214) org.artifactory.engine.DownloadServiceImpl.respond(DownloadServiceImpl.java:176) org.artifactory.engine.DownloadServiceImpl.process(DownloadServiceImpl.java:122) sun.reflect.GeneratedMethodAccessor93.invoke(Unknown Source) )
My valid signature that exists on Central can no longer be retrieved through the proxy. I have no doubt they will fix the crash. However: the problem still stands, how can you have a web of
trust that links back to the original developer, when proxies in the
middle are rewriting the artifact and stripping (or regenerating) the
pgp signature? Even if you trust your instance, how can you validate
the signature was correct for the inbound artifact before it was
rewritten? What if you’re proxying from someone else that happens to be
using Artifactory, did they Trojan you or just unwittingly break the
web of trust?
If you download things from the internet,
validating PGP signatures isn't something you should think about
doing, it is something you need to do. It is the only way
to guarantee that the artifacts from a remote repository are sound, and
Sonatype has invested a great deal of time into making sure that
artifacts added to the Central Maven repositories, the Apache
repositories, and the Codehaus repositories are all accompanied by
valid PGP keys that are on a public keyserver. In
addition to that, the ASF takes the idea of building a web of trust
very seriously. You shouldn't sign an ASF
release unless you've had your key signed by someone in the ASF's web
of trust at a key signing event (PGP keys are best signed only if you
can verify someone's signature, face-to-face.) It seems a
shame to throw away all of that work just to "clean" the POM of
repository elements.
Again, JFrog has written publicly that the
only reason this POM rewriting is necessary is because they think that
Maven is broken by design. But, their fix throws
away the web of trust that makes it possible to validate the contents
of a repository using original PGP keys from project
developers. We've considered similar changes in the
past, but because we are responsible for maintaining some of these
source repositories, we are forced to think about the ramifications of
our changes for the community. Building a
repository manager that just "throws out" PGP signatures for POMs seems
to me to be irresponsible when we're starting to make traction on the
difficult job of making sure that new artifacts added to central have
PGP signatures.
Artifactory Produces Non-standard Indexes
We
also had some reports of odd indexing behavior. The original index
format was a Lucene 2.3 binary file zipped up in a convenient
archive. This created a problem because if you want to
upgrade to a newer version of Lucene, you can no longer produce the
older formatted version. Newer versions of Lucene cannot
generate backwards-compatible binary index files.
Because the community needed to maintain backwards-compatibility for
all older clients, the standard Index that is produced by the major
public repositories is now a new binary layout completely separate from
Lucene. All of this work was done in the Nexus Indexer
project, a separate, open-source project that has been available under
the Eclipse Public License (EPL) and which is already integrated into
all repository managers. This new .gz format. In addition to
being a neutral format, it also supports incremental indexes. The
indexes produced by Artifactory are using the old-style Lucene zip, but
with a newer version of Lucene. This means it is non-standard and is
not consumable by all IDE plugins or other index clients.
Another
problem we found was that the indexes presented by the "virtual" repos
(equivalent to Nexus group indexes) serve up only the index of the last
repository in the list. This means in an enterprise you can not get an
index that contains all artifacts available to you, both internal and
external. While you can certainly use the
Artifactory search interface, the promise of a repository index is that
tools like m2eclipse and other Maven plugins can use this index to
quickly locate artifacts that contain particular classes or quickly
generate a list of versions for a particular artifactId.
Because
it is important for all repository managers to produce interoperable
repository indexes, we've decided to donate
the Nexus Indexer code to the Apache Software Foundation.
The Nexus index is the standard format for a Maven
repository, it is integrated into Archiva, Nexus, and Artifactory, and
it just makes sense that the code that created this index be moved
moved to an open, transparent community like
Apache. This will increase the visibility of the
Nexus Indexer code for people that actively participate in the Maven
community.
Artifactory Breaks Wagon
Maven and
the Maven Ant Tasks use something called the Wagon to transfer files to
and from a repository. It is the "transport abstraction that is used
in Maven's artifact and repository handling code", and it has providers
for SCP, HTTP, FTP, and file. Any time Maven sees a URL, the Maven
Wagon component handles the transfer. I won't go into the gory
details of this component, but one of the things that a repository
manager needs to do is provide some sort of file list for a
directory. All of the other protocols with Wagon
providers have some way to get a directory listing. The basic subset of HTTP that is supported by all web servers does not
have this command, so the HTTP wagon relies the repository returning a
list of links to the folder's contents.
Instead of returning
such a list of folder contents, Artifactory tries to redirect the
client to the UI. It doesn't return a file list,
and anything in Maven that relies on Wagon's ability to get a file list
will fail. In other words, anything in Maven or any Maven
plugin that uses wagon.getFileList() interface will break when you are
using Artifactory. You can see it here:
[INFO] Scanning remote file system: http://repo1.maven.org/maven2/org/apache/mav
en/apache-maven/2.0.10/ ...
[INFO] apache-maven-2.0.10-bin.tar.bz2
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc.md5
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc.sha1
[INFO] apache-maven-2.0.10-bin.tar.bz2.md5
[INFO] apache-maven-2.0.10-bin.tar.bz2.sha1
[INFO] apache-maven-2.0.10-bin.tar.gz
[INFO] apache-maven-2.0.10-bin.tar.gz.asc
[INFO] apache-maven-2.0.10-bin.tar.gz.asc.md5
[INFO] apache-maven-2.0.10-bin.tar.gz.asc.sha1
[INFO] apache-maven-2.0.10-bin.tar.gz.md5
[INFO] apache-maven-2.0.10-bin.tar.gz.sha1
[INFO] apache-maven-2.0.10-bin.zip
[INFO] apache-maven-2.0.10-bin.zip.asc
[INFO] apache-maven-2.0.10-bin.zip.asc.md5
[INFO] apache-maven-2.0.10-bin.zip.asc.sha1
[INFO] apache-maven-2.0.10-bin.zip.md5
[INFO] apache-maven-2.0.10-bin.zip.sha1
[INFO] apache-maven-2.0.10-sources.jar
[INFO] apache-maven-2.0.10-sources.jar.asc
[INFO] apache-maven-2.0.10-sources.jar.asc.md5
[INFO] apache-maven-2.0.10-sources.jar.asc.sha1
[INFO] apache-maven-2.0.10-sources.jar.md5
[INFO] apache-maven-2.0.10-sources.jar.sha1
[INFO] apache-maven-2.0.10.jar
[INFO] apache-maven-2.0.10.jar.asc
[INFO] apache-maven-2.0.10.jar.asc.md5
[INFO] apache-maven-2.0.10.jar.asc.sha1
[INFO] apache-maven-2.0.10.jar.md5
[INFO] apache-maven-2.0.10.jar.sha1
[INFO] apache-maven-2.0.10.pom
[INFO] apache-maven-2.0.10.pom.asc
[INFO] apache-maven-2.0.10.pom.asc.md5
[INFO] apache-maven-2.0.10.pom.asc.sha1
[INFO] apache-maven-2.0.10.pom.md5
[INFO] apache-maven-2.0.10.pom.sha1
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 32 seconds
[INFO] Finished at: Mon Jan 04 17:17:11 EST 2010
[INFO] Final Memory: 8M/47M
[INFO] ------------------------------------------------------------------------
C:\svn\staging-test>mvn validate
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building staging-test
[INFO] task-segment: [validate]
[INFO] ------------------------------------------------------------------------
[INFO] [wagon:list {execution: upload-javadoc}]
[INFO] Scanning remote file system: http://repo.jfrog.org/artifactory/libs-relea
ses/org/apache/maven/apache-maven/2.0.10/ ...
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Error handling resource
Embedded error: Error transferring file
Server redirected too many times (20)
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
Summary
Last time I wrote something about Artifactory, the founders of the company came back and called me biased and not objective. Judge for yourself, I've presented some concrete facts in this post.
I have to tell you that the thing that really struck a chord with me and the other engineers at Sonatype was the idea that someone could write a blog post saying that Sonatype is "all talk". It just doesn't make any sense, as a corporation we've poured resources into the foundational technologies that our competitors use. I spend a great deal of my time working on the Maven project, stopping Denial of Service attacks on Central, I'm on the PMC, a lot of that time is spent trying to make Maven a better product. A lot of this work involves talking to our competitors about ways to improve Maven and related technologies. To hear someone come at us because we're "all talk" is, frankly, insulting given the hours (no, years) we've put into this open source community.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)






Comments
Stephen Colebourne replied on Fri, 2010/01/08 - 4:20am
Brian Fox replied on Fri, 2010/01/08 - 8:32am
in response to:
Stephen Colebourne
Geoffrey De Smet replied on Fri, 2010/01/08 - 11:10am
Are any of the JFrog guys committers in Maven? Keep up the good work on Maven 3 (not just the sonatype committers, all the maven committers).
Brian Fox replied on Fri, 2010/01/08 - 11:31am
in response to:
Geoffrey De Smet
Shlomi Ben-haim replied on Sat, 2010/01/09 - 2:47pm
You should get your facts right before slandering other products/technologies! This is especially true if you try to imply that Artifactory is doing something illegal.
If you had looked closely, you'd see that Artifactory doesn't modify the original remotely cached POMs stored in the repository (http://repo.jfrog.org/artifactory/repo1/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom), so you can always perform reliable license validation (which you do on the repository side anyway - even Nexus supports this).
Thank you for your QA work and "legal" efforts, but we trust our users to know that if they activate an optional feature that modifies a POM in order to help them build, they should nevertheless distribute their product with the original POM (if they distribute it with the POM at all, which is uncommon).
And yes, there are many broken POMs out there, and yes the concept of auto-active repositories in POMs is a serious problem in Maven, and I can tell from experience that many organizations will end up having manually edited POMs in their repository to work around these and other problems.
It also looks like you and the company you work for Sonatype would like developers to have a single-vendor, single technology-stack market! The reason for this post is not to educate users, but to trash Nexus' only powerful competitor. This aggressive, "we know better than anyone" behavior is unfortunately typical when it comes to alternative products and technologies, and you spend a great deal of your time spreading it.
You just voted down a pure technical blog that covers alternative technologies (IDEA/Gradle and not m2Eclipse/Maven - you cleared your vote already, but the original is here). Why would you do that? Also, why do you treat fellow Apache projects, like Archiva, with such contempt, putting your Apache PMC hat on and off, as long as it serves your commercial interests.
You have learned a lot from the experience of Artifactory while building Nexus and are still learning - just recently copying, to a limited degree of usefulness, the concept of artifact-based searchable metadata, adding support for multiple app-servers etc. Maybe you should also learn to ask yourself what you are doing right and what you doing wrong, and if things that you may try to impose as standards are going in the right direction (making people download security-ignorant, and constantly out-of-date indexes instead of making repositories searchable, wagon not properly following redirects, non-preemptive authentication in wagon etc.)
You think of yourself as a standard maker and a best-practice expert, but you have to learn to deal with criticism and accept the fact that while there are things that you do well, there are also areas where others may do better - whether these are competing products or competing technologies.
We hope that in the future you will respect the competition and its value for all of us.
Miya Nadri replied on Sat, 2010/01/09 - 3:59pm
Brian Fox replied on Sun, 2010/01/10 - 12:12am
I think two comments here are worth responding to:
First the Gradle vote was clearly an accident. As you noted, I had already cleared my vote before your comment. Somehow I had clicked it when viewing the rising link list and I reset it when I noticed later that day. I never noticed before that you could vote directly in the rising link tab...which frankly seems pointless, why let you vote without first reading an article or at least the summary?
I don't vote down things simply because I don't like the technology, and voting down a blog about adding Gradle to support to Idea is completely pointless, since I don't use either of them. (though I'm flattered that you watch me enough to take screenshots of votes)
For the record, I have nothing against Gradle or IDEA and neither does Sonatype. Take a look at many of the talks Jason has done, even he says use what you want, we just care about interoperability at the repository layer...which btw was the overarching theme of this post.
Second, on copying your property support, I can't fault you for thinking this was a copy given the timing of releases, but we started RDF based indexing to support the metadata in July. Alin has been working on this type of technology for years and it was one of the first things he did when joining Sonatype. Unfortunately for us, the 1.4 release held up some of this work. You don't even have to take my word for it, see the history here and here. It looks like your property support was released on October 29th, far after we where done developing the plugin. I could point out plenty of examples in the opposite direction, but I don't see any point...
I think Paul Grahm would sum up your comment as a series of DH1 Ad Hominem attacks since you didn't really address any of the key technical facts that I pointed out.