Reddit the open-source software

ketralnis has responded to this post here and throughout this thread.

Occurrences of “reddit the open-source software” have been abbreviated to “reddit OSS”. – Nov. 19, 2010

I use reddit, as in reddit the open-source software, for a website that doesn’t get much traffic for several reasons. reddit OSS is one of the bigger reasons. I want to talk about reddit OSS and its management for a moment.

reddit OSS is published at http://github.com/reddit and http://code.reddit.com. reddit.com usually works pretty well, but reddit OSS is very unfriendly to anyone that is not reddit.com.

There has not been a push for about a month, and before that, there had not been a push since mid-July, despite “planning on a much more sane release schedule for future patches (much closer to ‘weekly’ rather than ‘epoch modulo 10Ms’).” The long lag time between pushing changes makes code merges when a new version eventually does get pushed a serious undertaking, especially for those who run hobby or part-time sites (as most running the reddit open-source platform would be). Each time I have updated my reddit installation to a new HEAD it has been two or three days of configuration, re-merging, and bug-squashing before the updated codebase was working as expected; recently, subtle failures occurred while running ads for the site and essentially made it impossible to post comments. If changes were pushed in smaller increments, the same necessary merges would be much easier to handle; merging three or four changes is much simpler than merging 60-70+.

Merges get even more complicated because to customize reddit even in the most basic ways, you’ll have to hack up several base code files that contain a lot of other stuff. When you clone reddit from git, the clone comes with the same ads that run on reddit, and the only way to remove them is to edit that file, a file that git tracks and a file that clashes on merges (if you don’t –assume-unchanged, which is probably safe in this case as that file hasn’t been updated in over two years, but still extra hassle and excludes all future changes from applying automatically — changes which may be important).

There are several other instances for things that really should have been cleaned up for reddit OSS but still linger, and as you go through removing all them, you get quite a few changes built up — changes that cause problems when it’s time to pull. You shouldn’t have to sanitize the codebase of the OSS version in the first place; that’s the maintainer’s job.

Most obvious among these things that should have been stripped is the reddit alien. It is all over the place — under the submit link button, under the create a subreddit button, thumbnail placeholder, and so on. As far as I know the reddit alien is still held by Conde Nast/reddit corporate under an All Rights Reserved copyright license, as one might expect for a company’s logo. The term “reddit”, “subreddit”, etc., appear throughout the site, causing potential trademark liabilities.

If a website that runs reddit OSS starts to gain momentum, how long do we expect the lawyers at Conde Nast to abide usage of the reddit name and logo on a website over which they have no control, especially if that site infringes on reddit.com’s primary audience? Why can’t they draw an distinct alien for reddit OSS or just include generic images and icons from Tango et al? It would be a much better thing to do so. My site has been going for almost a year and I’m still finding the term “reddit” sprinkled in odd places, despite going through the translation file a few times. It’s hard-coded in some spots.

Then, to run reddit OSS, one must use memcached, Cassandra, an AMQP server like rabbitmq, PostgreSQL, and a handful of paster daemons included with reddit, which are currently configured to run with daemontools, so unless you want to spend a while converting the current scripts/daemons, you must also install and use daemontools. Furthermore, running these daemons is non-obvious and it was not required when I originally pulled, so it took me a while to figure out a lot of the weird bugs I got resulted from not running these daemons. These daemons are mostly for caching as far as I know, but if you don’t have them in place weird things like disappearing thumbnails and comments will befall you. The commit messages I saw did not make big shiny letters about it, and the overall documentation is poor.

reddit.com does almost no testing of reddit OSS. They just push out what they run on reddit.com. Many times in #reddit-dev I have seen “we haven’t tested it that way but it should work…” before someone describes a bug or submits a patch. reddit does not test reddit in a conventional environment.

In the October update, reddit merged several contributed patches, but prior thereto it was rather rare, only occurring a couple of times on a couple of patches (from the github history). There are still a lot of changes out there that would do well to be merged, but reddit.com is trying to keep the codebase unified (despite its super-ugly squash commits that get pushed out in the “weekly” updates), so if your patch would help most users of reddit OSS but not reddit.com, it won’t get merged. This can be good in some cases — it forced me to produce a more scalable database reconnect priority patch, for instance — but it can also mean that more sensible defaults or caching mechanisms for sites that are not reddit.com would be rejected.

The reddit guys insist that their number one priority is reddit.com and almost any time someone brings up a push of reddit.com to the OSS version or merging of a patch or whatever in #reddit-dev, ketralnis is adamant that there is just no time for that. reddit is clearly understaffed and reddit OSS is largely neglected.

There’s not necessarily anything wrong with that, but all of this means that reddit OSS is in prime condition for a fork. However, ketralnis does not think a fork is a good idea. Here is a snippet from IRC, with pieces omitted for brevity and coherence:

(01:03:12 AM) sjuxax: I am planning on forking reddit sometime soon fyi
(01:05:07 AM) ketralnis: I wouldn’t recommend that

(01:05:12 AM) sjuxax: why?

(01:05:21 AM) ketralnis: It’d be a nightmare to maintain against our code-releases, for one

(01:05:47 AM) ketralnis: For another, the license make it difficult to divorce from our brand

(01:05:55 AM) sjuxax: Well it’s already a nightmare to merge with the six-month release cycles and big changes you guys make.

(01:06:12 AM) ketralnis: Agreed, and we should do less of that

(01:06:14 AM) sjuxax: The license basically just requires the attribution at the bottom, right?

(01:06:39 AM) ketralnis: If you’re planning on forking it, you should actually read it. It’s not a long one

(01:06:43 AM) sjuxax: so we can leave that, but the alien is all over. Obviously the license won’t let us get rid of the powered by reddit logo, but the rest should be free to go

(01:06:51 AM) sjuxax: I have read it in the past, but it’s been a while

(01:07:16 AM) ketralnis: I understand where you’re coming from, but it would harm our open-source development to have it forked

(01:08:43 AM) sjuxax: Well I would prefer to keep upstream and the fork at least somewhat compatible

(01:08:58 AM) sjuxax: so hopefully most patches could still go both ways

(01:10:17 AM) sjuxax: but yeah, uh, sorry. reddit has neglected its open-source users imo so a fork is inevitable when you get serious users; that’s why we use OSS software; if the maintainer isn’t taking care of it, someone else can

(01:10:47 AM) ketralnis: We are taking care of it, in that it’s what’s running our live site, right now. 14 million pageviews yesterday.

(01:11:04 AM) ketralnis: So I’d say it’s holding up rather well under its current maintanence

(01:11:21 AM) sjuxax: OK, you are taking care of your reddit installation, you are running reddit for reddit which is fine if that’s what you want to do

(01:11:25 AM) ketralnis: The right solution is for me to set aside a day to merge up with public, not to go forking it

(01:11:28 AM) sjuxax: but it is not attractive as an option for not-reddit

(01:11:43 AM) ketralnis: I’m telling you, forking us will hurt reddit.

(01:11:50 AM) sjuxax: but you don’t set aside that day often enough; you were going to do it weekly but now it’s been months again

(01:12:57 AM) sjuxax: reddit as an open-source project is either going to get forked or going to continue to limp on. it will be nice for reddit’s reddit, but if things keep going how they have been going, virtually no one is going to use the code you publish.

(01:12:57 AM) ketralnis: I don’t have time to argue this right now. But trust me, you forking reddit will fuck up my week, and probably stall any future open source contribution to reddit.

(01:13:53 AM) ketralnis: Forking it will make that situation worse by losing the only developers *paid* to contribute to it from your fork, and any open source developers from either

So reddit corporate would not be happy to see a fork rise up, but what choice do users of reddit OSS have? Things are definitely not good the way they are now and I think that a fork is ultimately inevitable unless reddit revises their policies, allows some divergence, and finally takes the open-source side of things seriously.

Is there much interest in a fork out there? There’s lots of good contributions on github that remain unmerged, and a fork would be more active about merging these and especially merging changes that enhance the platform for smaller sites. Once someone gets reddit.com-level traffic, they can switch the platform to the official reddit OSS and then all of the onerous/tricky/annoying/monstrous stuff that is employed by reddit to allow caching and survival under that kind of traffic will be beneficial.

The paths before reddit.com/reddit corporate are A) take reddit OSS seriously, get patching and merging fixed up and make it easier to push out changes, and then maintain the open-source version frequently and well, including possible divergences where it benefits the OSS user; B) stay the course until someone forks, and its unclear what the ultimate consequences of this would be. ketralnis seems to think it would mean a secession of commercially-funded development entirely; or C) stay the course until everyone gives up on reddit OSS and the project withers and dies. What’ll it be?

Facebook doin’ it wrong

Above is a picture of Facebook doin’ it wrong.

This is what happens when you leave yourself logged in too long, this is the “timeout” screen. You can notice its utter uselessness by observing that all around it is your friends’ confidential data, intended only for those approved to see it, not to mention some information about your own account.live streaming movie Nocturnal Animals 2016

That means if you leave this up at a computer lab, while Facebook will cause your session to die, which is good, they’ll leave your newsfeed containing arbitrary private data on-screen, which is really, really bad.

Even more hilarious is the “cancel” button, which causes this dialog to disappear and the one post obstructed by it to become visible.

Imagine if your bank did this.

NoSQL v. SQL is the worst holy war ever.

This seems to be filled with religious contention, as demonstrated at http://news.ycombinator.com/item?id=1163039 . Both sides are talking past each other, so I want to lay it out flat.

First of all, while NoSQL and RDBMS can sometimes exclude one another, they should not be seen as adversaries. NoSQL is designed to address a certain problem space and RDBMS is designed to address another. Both can be an important part of one infrastructure. So all of the resentment between sides is pointless.

Relational databases scale. NoSQL databases scale. Both are scalable and tunable, depending on the situation. Sometimes an RDBMS will be better for your project (yes, even in performance). Sometimes a NoSQL datastore will be better for your project.

NoSQL datastores like CouchDB and MongoDB are developed by competent developers. They are used by competent developers.

Relational and SQL-bearing databases like SQL Server and PostgreSQL are developed by competent developers. They are used by competent developers.

NoSQL offers a barebones solution for people whose primary concerns are speed and load. RDBMS offer a full-fledged solution for people whose primary concerns are data integrity and interrelatibility.

There might be a place in your organization for both!

There is no need to get haughty about this. Pick the design that works best for your problem set. There is no need for one to eliminate the other. Both are useful.

Good software development is all about good judgement. Anyone can learn syntax rules and throw together something that kind-of-sort-of works, but a good developer will know when to deploy one thing and when to deploy another. Keep your options open and stop the silliness.

NoSQL v. SQL is the worst holy war ever.

Mozilla needs to step it up

I’m currently using the Chromium developer builds for Linux and it’s amazing how Chromium is still so much faster than Firefox even though Google Chrome has been out for over a year now and Mozilla just performed a major release this summer.

It’s pretty obvious that TraceMonkey in its present state doesn’t hold up to either V8 or SquirrelFish Extreme (which score rather similarly in my experience, usually with SFX a relatively slight margin ahead of V8). I understand that TraceMonkey is much more competitive on Windows, but I almost never use Windows so I don’t care to look into that.

Mozilla really needs to focus on getting consistently good performance if they want to remain relevant.

Chromium is a godsend because of the competition and development it has both encouraged and provided in regard to in-browser JavaScript VMs. It is crucial to the future of the web that we get this show on the road, because fast JavaScript and HTML 5 canvas means the end of proprietary, patent-encumbered necessary evils like Flash and Silverlight. It’s almost impossible to browse without Flash anymore, and that must change.

I’m annoyed at Mozilla because despite their overtures and aggrandizing, Firefox is improving very slowly, and still can’t seem to cope with many of the same demos that Chrome 1.0 was chewing through without issue.

The sad thing is that I don’t really want to switch to Chromium and I don’t want the world to switch to it either. Google has way too much control as it is, with access to almost everyone’s email, search history, etc., and the ability to effectively kill off anyone who depends on referrals from search traffic (most sites see 80%-90% of external search referrals from Google) and Firefox already has thousands of good extensions and themes, not to mention a slight rapport and installed base in the general public.

But Chromium is just so much faster and safer; even if I could bring myself to ignore the 250% speed difference in _just_ the JavaScript VM (no mention here of Chromium’s vastly faster user interface), Firefox has been crashing a lot lately due to erroneous packaging by my distro, but if and when Chromium crashes, it only brings down the affected tab and everything else remains intact, which, at least this week, has made browsing much more pleasant.

Chromium’s every-tab-as-a-process technique also makes exploits much more difficult.

These are the results I just got from Sunspider, against the latest available Chromium and Firefox 3.7 nightly builds on an up-to-date Arch Linux install with kernel 2.6.31.

This is a 32-bit Chromium against a 64-bit Firefox, but the 32-bit to 32-bit results were similar and actually a bit less favorable to Firefox.

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           2.22x as fast     1092.2ms +/- 4.7%   492.6ms +/- 3.6%     significant

=============================================================================

  3d:                  2.10x as fast      154.4ms +/- 1.5%    73.4ms +/- 4.3%     significant
    cube:              1.87x as fast       47.2ms +/- 6.0%    25.2ms +/- 5.4%     significant
    morph:             1.40x as fast       35.0ms +/- 0.0%    25.0ms +/- 7.0%     significant
    raytrace:          3.11x as fast       72.2ms +/- 2.8%    23.2ms +/- 5.9%     significant

  access:              3.50x as fast      130.8ms +/- 1.6%    37.4ms +/- 6.5%     significant
    binary-trees:      20.0x as fast       40.0ms +/- 3.1%     2.0ms +/- 44.0%     significant
    fannkuch:          4.00x as fast       55.2ms +/- 1.9%    13.8ms +/- 9.9%     significant
    nbody:             1.28x as fast       23.6ms +/- 2.9%    18.4ms +/- 3.7%     significant
    nsieve:            3.75x as fast       12.0ms +/- 12.7%     3.2ms +/- 17.4%     significant

  bitops:              ??                  36.6ms +/- 6.2%    37.0ms +/- 4.1%     not conclusive: might be *1.01x as slow*
    3bit-bits-in-byte: ??                   1.6ms +/- 42.6%     2.4ms +/- 28.4%     not conclusive: might be *1.50x as slow*
    bits-in-byte:      1.18x as fast       10.6ms +/- 6.4%     9.0ms +/- 9.8%     significant
    bitwise-and:       *4.27x as slow*      2.2ms +/- 25.3%     9.4ms +/- 7.2%     significant
    nsieve-bits:       1.37x as fast       22.2ms +/- 8.3%    16.2ms +/- 6.4%     significant

  controlflow:         10.7x as fast       34.4ms +/- 4.1%     3.2ms +/- 17.4%     significant
    recursive:         10.7x as fast       34.4ms +/- 4.1%     3.2ms +/- 17.4%     significant

  crypto:              1.82x as fast       56.8ms +/- 7.0%    31.2ms +/- 6.5%     significant
    aes:               3.57x as fast       33.6ms +/- 8.1%     9.4ms +/- 7.2%     significant
    md5:               1.29x as fast       14.2ms +/- 3.9%    11.0ms +/- 8.0%     significant
    sha1:              *1.20x as slow*      9.0ms +/- 16.9%    10.8ms +/- 9.6%     significant

  date:                2.28x as fast      171.6ms +/- 2.2%    75.2ms +/- 3.9%     significant
    format-tofte:      3.46x as fast      104.4ms +/- 3.4%    30.2ms +/- 1.8%     significant
    format-xparb:      1.49x as fast       67.2ms +/- 2.4%    45.0ms +/- 5.9%     significant

  math:                *1.05x as slow*     45.4ms +/- 2.4%    47.6ms +/- 3.5%     significant
    cordic:            1.06x as fast       20.2ms +/- 2.8%    19.0ms +/- 6.5%     significant
    partial-sums:      *1.10x as slow*     18.8ms +/- 3.0%    20.6ms +/- 3.3%     significant
    spectral-norm:     *1.25x as slow*      6.4ms +/- 17.4%     8.0ms +/- 0.0%     significant

  regexp:              4.44x as fast       78.2ms +/- 12.2%    17.6ms +/- 3.9%     significant
    dna:               4.44x as fast       78.2ms +/- 12.2%    17.6ms +/- 3.9%     significant

  string:              2.26x as fast      384.0ms +/- 11.2%   170.0ms +/- 4.8%     significant
    base64:            *1.60x as slow*     11.0ms +/- 8.0%    17.6ms +/- 6.3%     significant
    fasta:             2.49x as fast       72.8ms +/- 3.9%    29.2ms +/- 7.6%     significant
    tagcloud:          2.83x as fast      102.4ms +/- 6.7%    36.2ms +/- 7.4%     significant
    unpack-code:       3.02x as fast      162.6ms +/- 17.7%    53.8ms +/- 3.4%     significant
    validate-input:    -                   35.2ms +/- 18.3%    33.2ms +/- 4.9%

Removing Adobe Drive CS4 in Windows

So, after sitting around for more than an hour waiting for Adobe’s indecently bloated CS4 installer to finish installing Photoshop and Flash, I right-click on a file, and am rewarded with a lovely little “Adobe Drive CS4” context option. I definitely didn’t want this, and I’m upset that when all I asked for was Flash and Photoshop two new Adobe submenus appear on my Start list, one containing only “Adobe Media Player” and another containing nine items, only two of which I asked for, plus another top-level icon for “Acrobat.com”, so Adobe sucks.

Anyway, it seems that the recommended method to remove Adobe Drive CS4 from the context menu is to open the installer and uninstall it (funny that I wasn’t asked about this the first time), but if you’re running Windows in a virtualized guest like me and don’t want to wait the ten minutes it takes Adobe to “[check your system profile]” and “[Load] Setup”, remove these two registry keys:

HKEY_CLASSES_ROOT\AllFilesystemObjects\shellex\ContextMenuHandlers\{C95FFEAE-A32E-4122-A5C4-49B5BFB69795}
HKEY_CLASSES_ROOT\Directory\Background\shellex\ContextMenuHandlers\{C95FFEAE-A32E-4122-A5C4-49B5BFB69795}

and you should be freed from offending entry.

Alas, Adobe Drive CS4 is still sitting around somewhere sucking up space uselessly, but we’ll leave well enough alone for now.