A results-oriented leader with 14 years of leadership experience and a proven ability to mentor and lead local and remote teams of software engineers and testers to develop sophisticated software solutions on aggressive schedules. Focused on using technology, process and development resources in a pragmatic way to deliver functionality to end users that is high quality and most importantly meets their needs. Demonstrated ability to adapt quickly to business needs and learn new job functions and skills with minimal to no supervision. Brian is a DZone MVB and is not an employee of DZone and has posted 4 posts at DZone. You can read more from them at their website. View Full User Profile

Why Nexus and not Artifactory? Compliance, Standards, Security, and Quality

01.07.2010
| 15472 views |
  • submit to reddit

We (Sonatype) recently received some support requests from a company making a switch from Artifactory to Nexus.  In the evaluation and system design phase, they were setting up Nexus to proxy their internal Artifactory instance and where having some troubles with integration. Our support staff did some digging and the results where unexpected. 

Before I get into the details, I just want to say that I don't derive much satisfaction from pointing out problems in Artifactory, and I won't claim Nexus is perfect either, but we pay very detailed attention to key areas like stability, performance and most importantly, interoperability.  Frankly, it isn't something I'd like to be spending my time on, but I've read so much hyperbole from JFrog about how configuring mirrorOr is "lazy and dirty", and so much trash talk about Sonatype just being "all talk" that I think it is time to start answering the criticism.

POM Rewriting and License Compliance

The customer was configuring their system to use the Procurement support in Nexus and it was choking on validating the signature of a lot of artifacts coming from their legacy Artifactory system.  Upon investigation, we found that Artifactory completely rewrites the pom files, presumably as part of a new feature to strip out repository entries from the poms. To see for yourself, compare the results of these two urls:

http://repo1.maven.org/maven2/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom

and

http://repo.jfrog.org/artifactory/libs-releases/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom

Notice first that this pom has no repository element in it, therefore there is no need to modify the file at all. A closer evaluation will reveal that this pom being “proxied” by Artifactory is completely rewritten, removing all comments and reordering elements. I personally don’t think it’s a good idea to muck around with files being proxied but it’s probably fine assuming all the parsing is done correctly. It does introduce yet another place for things to go wrong though.  I mean comments aren’t really that important are they?  Well, if you care about open source licensing, they are. Take a look at this POM from Central:

 <!--

Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>maven</artifactId>
<groupId>org.apache.maven</groupId>
<version>2.0.10</version>
</parent>

Now take a look at first few lines from  the same POM from JFrog's public repository:

<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>
<parent>
...

The License header of the file has been completely stripped away.  I was pretty sure that this might be a violation of the license itself, so I checked the Apache License at http://www.apache.org/licenses/LICENSE-2.0.

4.2  You must cause any modified files to carry prominent notices stating that You changed the files; and

4.3 You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works

I am not a lawyer, but I interpret this to mean that if you have this option turned on, and you are distributing these POMs to anyone else, you may be violating the license of artifacts being proxied. This example is for the ASL license, a rather liberal license as they go, but as an active participant in the ASF, I can tell you that the organization takes licensing issues very seriously.  The fact is headers from any pom would be dumped and most licenses out there probably frown upon this.  Some of you are going to shrug this off as a minor problem, maybe it is, but this is the sort of minor issue that will make a legal compliance department go berserk.  But, striping licenses off of POMs wasn't really the main issue, it was just something I stumbled on trying to find a solution to the problem with PGP signatures.

POM Rewriting and PGP Signatures

Setting aside the license issue for a moment, let’s go back to the procurement issue that was reported. Now try getting the signature file for this artifact so you can validate it hasn’t been tampered with.  The asc file should have a GPG signature that was created with a publicly accessible key.  Click on the following URL on central to see an example.

http://repo1.maven.org/maven2/org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc

Ok so far? (That my signature fwiw) Here’s the crux of the issue.  Click on the same artifact in the Atifactory proxy of Central below:

http://repo.jfrog.org/artifactory/libs-releases/org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc

At the time of writing, I get:

HTTP Status 500 - 

type Exception report

message

description The server encountered an internal error () that prevented it from fulfilling this request.

exception

java.lang.IllegalArgumentException: Checksum type not found for path org/apache/maven/apache-maven/2.0.9/apache-maven-2.0.9.pom.asc org.artifactory.engine.DownloadServiceImpl.respondForChecksumRequest(DownloadServiceImpl.java:214) org.artifactory.engine.DownloadServiceImpl.respond(DownloadServiceImpl.java:176) org.artifactory.engine.DownloadServiceImpl.process(DownloadServiceImpl.java:122) sun.reflect.GeneratedMethodAccessor93.invoke(Unknown Source) )

My valid signature that exists on Central can no longer be retrieved through the proxy. I have no doubt they will fix the crash. However: the problem still stands, how can you have a web of trust that links back to the original developer, when proxies in the middle are rewriting the artifact and stripping (or regenerating) the pgp signature? Even if you trust your instance, how can you validate the signature was correct for the inbound artifact before it was rewritten? What if you’re proxying from someone else that happens to be using Artifactory, did they Trojan you or just unwittingly break the web of trust?

If you download things from the internet, validating PGP signatures isn't something you should think about doing, it is something you need to do.  It is the only way to guarantee that the artifacts from a remote repository are sound, and Sonatype has invested a great deal of time into making sure that artifacts added to the Central Maven repositories, the Apache repositories, and the Codehaus repositories are all accompanied by valid PGP keys that are on a public keyserver. In addition to that, the ASF takes the idea of building a web of trust very seriously.  You shouldn't sign an ASF release unless you've had your key signed by someone in the ASF's web of trust at a key signing event (PGP keys are best signed only if you can verify someone's signature, face-to-face.)  It seems a shame to throw away all of that work just to "clean" the POM of repository elements.

Again, JFrog has written publicly that the only reason this POM rewriting is necessary is because they think that Maven is broken by design.  But, their fix throws away the web of trust that makes it possible to validate the contents of a repository using original PGP keys from project developers.  We've considered similar changes in the past, but because we are responsible for maintaining some of these source repositories, we are forced to think about the ramifications of our changes for the community.  Building a repository manager that just "throws out" PGP signatures for POMs seems to me to be irresponsible when we're starting to make traction on the difficult job of making sure that new artifacts added to central have PGP signatures.

Artifactory Produces Non-standard Indexes

We also had some reports of odd indexing behavior. The original index format was a Lucene 2.3 binary file zipped up in a convenient archive.  This created a problem because if you want to upgrade to a newer version of Lucene, you can no longer produce the older formatted version.  Newer versions of Lucene cannot generate backwards-compatible binary index files.  Because the community needed to maintain backwards-compatibility for all older clients, the standard Index that is produced by the major public repositories is now a new binary layout completely separate from Lucene.  All of this work was done in the Nexus Indexer project, a separate, open-source project that has been available under the Eclipse Public License (EPL) and which is already integrated into all repository managers.  This new .gz format. In addition to being a neutral format, it also supports incremental indexes. The indexes produced by Artifactory are using the old-style Lucene zip, but with a newer version of Lucene. This means it is non-standard and is not consumable by all IDE plugins or other index clients.

Another problem we found was that the indexes presented by the "virtual" repos (equivalent to Nexus group indexes) serve up only the index of the last repository in the list. This means in an enterprise you can not get an index that contains all artifacts available to you, both internal and external.  While you can certainly use the Artifactory search interface, the promise of a repository index is that tools like m2eclipse and other Maven plugins can use this index to quickly locate artifacts that contain particular classes or quickly generate a list of versions for a particular artifactId.

Because it is important for all repository managers to produce interoperable repository indexes, we've decided to donate the Nexus Indexer code to the Apache Software Foundation.  The Nexus index is the standard format for a Maven repository, it is integrated into Archiva, Nexus, and Artifactory, and it just makes sense that the code that created this index be moved moved to an open, transparent community like Apache.  This will increase the visibility of the Nexus Indexer code for people that actively participate in the Maven community.

Artifactory Breaks Wagon

Maven and the Maven Ant Tasks use something called the Wagon to transfer files to and from a repository.    It is the "transport abstraction that is used in Maven's artifact and repository handling code", and it has providers for SCP, HTTP, FTP, and file.   Any time Maven sees a URL, the Maven Wagon component handles the transfer.    I won't go into the gory details of this component, but one of the things that a repository manager needs to do is provide some sort of file list for a directory.  All of the other protocols with Wagon providers have some way to get a directory listing.  The basic subset of HTTP that is supported by all web servers does not have this command, so the HTTP wagon relies the repository returning a list of links to the folder's contents.

Instead of returning such a list of folder contents, Artifactory tries to redirect the client to the UI.  It doesn't return a file list, and anything in Maven that relies on Wagon's ability to get a file list will fail.  In other words, anything in Maven or any Maven plugin that uses wagon.getFileList() interface will break when you are using Artifactory. You can see it here:

[INFO] Scanning remote file system: http://repo1.maven.org/maven2/org/apache/mav
en/apache-maven/2.0.10/ ...
[INFO] apache-maven-2.0.10-bin.tar.bz2
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc.md5
[INFO] apache-maven-2.0.10-bin.tar.bz2.asc.sha1
[INFO] apache-maven-2.0.10-bin.tar.bz2.md5
[INFO] apache-maven-2.0.10-bin.tar.bz2.sha1
[INFO] apache-maven-2.0.10-bin.tar.gz
[INFO] apache-maven-2.0.10-bin.tar.gz.asc
[INFO] apache-maven-2.0.10-bin.tar.gz.asc.md5
[INFO] apache-maven-2.0.10-bin.tar.gz.asc.sha1
[INFO] apache-maven-2.0.10-bin.tar.gz.md5
[INFO] apache-maven-2.0.10-bin.tar.gz.sha1
[INFO] apache-maven-2.0.10-bin.zip
[INFO] apache-maven-2.0.10-bin.zip.asc
[INFO] apache-maven-2.0.10-bin.zip.asc.md5
[INFO] apache-maven-2.0.10-bin.zip.asc.sha1
[INFO] apache-maven-2.0.10-bin.zip.md5
[INFO] apache-maven-2.0.10-bin.zip.sha1
[INFO] apache-maven-2.0.10-sources.jar
[INFO] apache-maven-2.0.10-sources.jar.asc
[INFO] apache-maven-2.0.10-sources.jar.asc.md5
[INFO] apache-maven-2.0.10-sources.jar.asc.sha1
[INFO] apache-maven-2.0.10-sources.jar.md5
[INFO] apache-maven-2.0.10-sources.jar.sha1
[INFO] apache-maven-2.0.10.jar
[INFO] apache-maven-2.0.10.jar.asc
[INFO] apache-maven-2.0.10.jar.asc.md5
[INFO] apache-maven-2.0.10.jar.asc.sha1
[INFO] apache-maven-2.0.10.jar.md5
[INFO] apache-maven-2.0.10.jar.sha1
[INFO] apache-maven-2.0.10.pom
[INFO] apache-maven-2.0.10.pom.asc
[INFO] apache-maven-2.0.10.pom.asc.md5
[INFO] apache-maven-2.0.10.pom.asc.sha1
[INFO] apache-maven-2.0.10.pom.md5
[INFO] apache-maven-2.0.10.pom.sha1
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 32 seconds
[INFO] Finished at: Mon Jan 04 17:17:11 EST 2010
[INFO] Final Memory: 8M/47M
[INFO] ------------------------------------------------------------------------

C:\svn\staging-test>mvn validate
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building staging-test
[INFO] task-segment: [validate]
[INFO] ------------------------------------------------------------------------
[INFO] [wagon:list {execution: upload-javadoc}]
[INFO] Scanning remote file system: http://repo.jfrog.org/artifactory/libs-relea
ses/org/apache/maven/apache-maven/2.0.10/ ...
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Error handling resource

Embedded error: Error transferring file
Server redirected too many times (20)
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch

Summary

Last time I wrote something about Artifactory, the founders of the company came back and called me biased and not objective.  Judge for yourself, I've presented some concrete facts in this post.

I have to tell you that the thing that really struck a chord with me and the other engineers at Sonatype was the idea that someone could write a blog post saying that Sonatype is "all talk".  It just doesn't make any sense, as a corporation we've poured resources into the foundational technologies that our competitors use.  I spend a great deal of my time working on the Maven project, stopping Denial of Service attacks on Central, I'm on the PMC, a lot of that time is spent trying to make Maven a better product.  A lot of this work involves talking to our competitors about ways to improve Maven and related technologies.  To hear someone come at us because we're "all talk" is, frankly, insulting given the hours (no, years) we've put into this open source community.

Published at DZone with permission of Brian Fox, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags:

Comments

Stephen Colebourne replied on Fri, 2010/01/08 - 4:20am

It looks like you work for Sonatype. Do you? If so, then not stating that up front (rather than sort of implying it) is rather deceiving to readers.

Brian Fox replied on Fri, 2010/01/08 - 8:32am in response to: Stephen Colebourne

Yes I do, I've edited the first sentence to make that more clear.

Geoffrey De Smet replied on Fri, 2010/01/08 - 11:10am

Are any of the JFrog guys committers in Maven? Keep up the good work on Maven 3 (not just the sonatype committers, all the maven committers).

Brian Fox replied on Fri, 2010/01/08 - 11:31am in response to: Geoffrey De Smet

Shlomi Ben-haim replied on Sat, 2010/01/09 - 2:47pm

Mr. Fox,

You should get your facts right before slandering other products/technologies! This is especially true if you try to imply that Artifactory is doing something illegal.

If you had looked closely, you'd see that Artifactory doesn't modify the original remotely cached POMs stored in the repository (http://repo.jfrog.org/artifactory/repo1/org/apache/maven/apache-maven/2.0.10/apache-maven-2.0.10.pom), so you can always perform reliable license validation (which you do on the repository side anyway - even Nexus supports this).
Thank you for your QA work and "legal" efforts, but we trust our users to know that if they activate an optional feature that modifies a POM in order to help them build, they should nevertheless distribute their product with the original POM (if they distribute it with the POM at all, which is uncommon).
And yes, there are many broken POMs out there, and yes the concept of auto-active repositories in POMs is a serious problem in Maven, and I can tell from experience that many organizations will end up having manually edited POMs in their repository to work around these and other problems.

It also looks like you and the company you work for Sonatype would like developers to have a single-vendor, single technology-stack market! The reason for this post is not to educate users, but to trash Nexus' only powerful competitor. This aggressive, "we know better than anyone" behavior is unfortunately typical when it comes to alternative products and technologies, and you spend a great deal of your time spreading it.
You just voted down a pure technical blog that covers alternative technologies (IDEA/Gradle and not m2Eclipse/Maven - you cleared your vote already, but the original is here). Why would you do that? Also, why do you treat fellow Apache projects, like Archiva, with such contempt, putting your Apache PMC hat on and off, as long as it serves your commercial interests.

You have learned a lot from the experience of Artifactory while building Nexus and are still learning - just recently copying, to a limited degree of usefulness, the concept of artifact-based searchable metadata, adding support for multiple app-servers etc. Maybe you should also learn to ask yourself what you are doing right and what you doing wrong, and if things that you may try to impose as standards are going in the right direction (making people download security-ignorant, and constantly out-of-date indexes instead of making repositories searchable, wagon not properly following redirects, non-preemptive authentication in wagon etc.)

You think of yourself as a standard maker and a best-practice expert, but you have to learn to deal with criticism and accept the fact that while there are things that you do well, there are also areas where others may do better - whether these are competing products or competing technologies.

We hope that in the future you will respect the competition and its value for all of us.

Miya Nadri replied on Sat, 2010/01/09 - 3:59pm

Brain, For someone who don't derive much satisfaction from pointing out problems it looks like you enjoyed it so much that you actually wrote false arguments. It is also a pity you use your Apache/Maven position to promote Sonotype business goals...
As a consulting company we understand the importance of the competition in the open source solutions world. Though we consult our customers to use Artifactory, we also work on-sites that pick Archiva or Nexus. However, looking back 2 years period, I can definitely say - competition promote all products! Stop whining about others, put your efforts in doing a better product - that's the real thing the community needs!

Brian Fox replied on Sun, 2010/01/10 - 12:12am

I think two comments here are worth responding to:

First the Gradle vote was clearly an accident. As you noted, I had already cleared my vote before your comment. Somehow I had clicked it when viewing the rising link list and I reset it when I noticed later that day. I never noticed before that you could vote directly in the rising link tab...which frankly seems pointless, why let you vote without first reading an article or at least the summary?

I don't vote down things simply because I don't like the technology, and voting down a blog about adding Gradle to support to Idea is completely pointless, since I don't use either of them. (though I'm flattered that you watch me enough to take screenshots of votes)

For the record, I have nothing against Gradle or IDEA and neither does Sonatype. Take a look at many of the talks Jason has done, even he says use what you want, we just care about interoperability at the repository layer...which btw was the overarching theme of this post.

Second, on copying your property support, I can't fault you for thinking this was a copy given the timing of releases, but we started RDF based indexing to support the metadata in July. Alin has been working on this type of technology for years and it was one of the first things he did when joining Sonatype. Unfortunately for us, the 1.4 release held up some of this work. You don't even have to take my word for it, see the history here and here. It looks like your property support was released on October 29th, far after we where done developing the plugin. I could point out plenty of examples in the opposite direction, but I don't see any point...

I think Paul Grahm would sum up your comment as a series of DH1 Ad Hominem attacks since you didn't really address any of the key technical facts that I pointed out.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.