Rails inserts BLOB data into field with expected TEXT type

Encountering this on Rails 4.0.1 on Ruby 1.9.3. This replica cartier happens because ActiveRecord sees the data you’re inserting as binary. It’s a string encoding issue.

In my particular case, the problem arose from cartier love bracelet
sending the output of Digest::MD5.hexdigest() (disclaimer: do not use MD5 in security-sensitive applications) directly to the ORM for handling. There cartier love bracelet is a bug in Rubies < 2.0.0 that renders the output of hexdigest as ASCII-8BIT instead of US-ASCII. ASCII-8BIT is interpreted by many gems, including ActiveRecord, as binary data.

Thankfully, the workaround is quite simple. Just tack on a force_encoding call after your hexdigest and you should be good, like so:


This will return the hexdigest string in a format widely interpreted as text data, and ActiveRecord, YAML, and other gems will begin handling the string as expected.

Once again, this workaround is not necessary on Ruby >= 2.0.0, which already contains a patch for the Digest gems that specifies the correct encoding. Therefore, an alternate solution is to upgrade to Ruby >= 2.0.0, or compile a Ruby that contains the resolving patch.

Reddit the open-source software

ketralnis has responded to this post here and throughout this thread.

Occurrences of “reddit the open-source software” have been abbreviated to “reddit OSS”. – Nov. 19, 2010

I use reddit, as in reddit the open-source software, for a website that doesn’t get much traffic for several reasons. reddit OSS is one of the bigger reasons. I want to talk about reddit OSS and its management for a moment.

reddit OSS is published at http://github.com/reddit and http://code.reddit.com. reddit.com usually works pretty well, but reddit OSS is very unfriendly to anyone that is not reddit.com.

There has not been a push for about a month, and before that, there had not been a push since mid-July, despite “planning on a much more sane release schedule for future patches (much closer to ‘weekly’ rather than ‘epoch modulo 10Ms’).” The long lag time between pushing changes makes code merges when a new version eventually does get pushed a serious undertaking, especially for those who run hobby or part-time sites (as most running the reddit open-source platform would be). Each time I have updated my reddit installation to a new HEAD it has been two or three days of configuration, re-merging, and bug-squashing before the updated codebase was working as expected; recently, subtle failures occurred while running ads for the site and essentially made it impossible to post comments. If changes were pushed in smaller increments, the same necessary merges would be much easier to handle; merging three or four changes is much simpler than merging 60-70+.

Merges get even more complicated because to customize reddit even in the most basic ways, you’ll have to hack up several base code files that contain a lot of other stuff. When you clone reddit from git, the clone comes with the same ads that run on reddit, and the only way to remove them is to edit that file, a file that git tracks and a file that clashes on merges (if you don’t –assume-unchanged, which is probably safe in this case as that file hasn’t been updated in over two years, but still extra hassle and excludes all future changes from applying automatically — changes which may be important).

There are several other instances for things that really should have been cleaned up for reddit OSS but still linger, and as you go through removing all them, you get quite a few changes built up — changes that cause problems when it’s time to pull. You shouldn’t have to sanitize the codebase of the OSS version in the first place; that’s the maintainer’s job.

Most obvious among these things that should have been stripped is the reddit alien. It is all over the place — under the submit link button, under the create a subreddit button, thumbnail placeholder, and so on. As far as I know the reddit alien is still held by Conde Nast/reddit corporate under an All Rights Reserved copyright license, as one might expect for a company’s logo. The term “reddit”, “subreddit”, etc., appear throughout the site, causing potential trademark liabilities.

If a website that runs reddit OSS starts to gain momentum, how long do we expect the lawyers at Conde Nast to abide usage of the reddit name and logo on a website over which they have no control, especially if that site infringes on reddit.com’s primary audience? Why can’t they draw an distinct alien for reddit OSS or just include generic images and icons from Tango et al? It would be a much better thing to do so. My site has been going for almost a year and I’m still finding the term “reddit” sprinkled in odd places, despite going through the translation file a few times. It’s hard-coded in some spots.

Then, to run reddit OSS, one must use memcached, Cassandra, an AMQP server like rabbitmq, PostgreSQL, and a handful of paster daemons included with reddit, which are currently configured to run with daemontools, so unless you want to spend a while converting the current scripts/daemons, you must also install and use daemontools. Furthermore, running these daemons is non-obvious and it was not required when I originally pulled, so it took me a while to figure out a lot of the weird bugs I got resulted from not running these daemons. These daemons are mostly for caching as far as I know, but if you don’t have them in place weird things like disappearing thumbnails and comments will befall you. The commit messages I saw did not make big shiny letters about it, and the overall documentation is poor.

reddit.com does almost no testing of reddit OSS. They just push out what they run on reddit.com. Many times in #reddit-dev I have seen “we haven’t tested it that way but it should work…” before someone describes a bug or submits a patch. reddit does not test reddit in a conventional environment.

In the October update, reddit merged several contributed patches, but prior thereto it was rather rare, only occurring a couple of times on a couple of patches (from the github history). There are still a lot of changes out there that would do well to be merged, but reddit.com is trying to keep the codebase unified (despite its super-ugly squash commits that get pushed out in the “weekly” updates), so if your patch would help most users of reddit OSS but not reddit.com, it won’t get merged. This can be good in some cases — it forced me to produce a more scalable database reconnect priority patch, for instance — but it can also mean that more sensible defaults or caching mechanisms for sites that are not reddit.com would be rejected.

The reddit guys insist that their number one priority is reddit.com and almost any time someone brings up a push of reddit.com to the OSS version or merging of a patch or whatever in #reddit-dev, ketralnis is adamant that there is just no time for that. reddit is clearly understaffed and reddit OSS is largely neglected.

There’s not necessarily anything wrong with that, but all of this means that reddit OSS is in prime condition for a fork. However, ketralnis does not think a fork is a good idea. Here is a snippet from IRC, with pieces omitted for brevity and coherence:

(01:03:12 AM) sjuxax: I am planning on forking reddit sometime soon fyi
(01:05:07 AM) ketralnis: I wouldn’t recommend that

(01:05:12 AM) sjuxax: why?

(01:05:21 AM) ketralnis: It’d be a nightmare to maintain against our code-releases, for one

(01:05:47 AM) ketralnis: For another, the license make it difficult to divorce from our brand

(01:05:55 AM) sjuxax: Well it’s already a nightmare to merge with the six-month release cycles and big changes you guys make.

(01:06:12 AM) ketralnis: Agreed, and we should do less of that

(01:06:14 AM) sjuxax: The license basically just requires the attribution at the bottom, right?

(01:06:39 AM) ketralnis: If you’re planning on forking it, you should actually read it. It’s not a long one

(01:06:43 AM) sjuxax: so we can leave that, but the alien is all over. Obviously the license won’t let us get rid of the powered by reddit logo, but the rest should be free to go

(01:06:51 AM) sjuxax: I have read it in the past, but it’s been a while

(01:07:16 AM) ketralnis: I understand where you’re coming from, but it would harm our open-source development to have it forked

(01:08:43 AM) sjuxax: Well I would prefer to keep upstream and the fork at least somewhat compatible

(01:08:58 AM) sjuxax: so hopefully most patches could still go both ways

(01:10:17 AM) sjuxax: but yeah, uh, sorry. reddit has neglected its open-source users imo so a fork is inevitable when you get serious users; that’s why we use OSS software; if the maintainer isn’t taking care of it, someone else can

(01:10:47 AM) ketralnis: We are taking care of it, in that it’s what’s running our live site, right now. 14 million pageviews yesterday.

(01:11:04 AM) ketralnis: So I’d say it’s holding up rather well under its current maintanence

(01:11:21 AM) sjuxax: OK, you are taking care of your reddit installation, you are running reddit for reddit which is fine if that’s what you want to do

(01:11:25 AM) ketralnis: The right solution is for me to set aside a day to merge up with public, not to go forking it

(01:11:28 AM) sjuxax: but it is not attractive as an option for not-reddit

(01:11:43 AM) ketralnis: I’m telling you, forking us will hurt reddit.

(01:11:50 AM) sjuxax: but you don’t set aside that day often enough; you were going to do it weekly but now it’s been months again

(01:12:57 AM) sjuxax: reddit as an open-source project is either going to get forked or going to continue to limp on. it will be nice for reddit’s reddit, but if things keep going how they have been going, virtually no one is going to use the code you publish.

(01:12:57 AM) ketralnis: I don’t have time to argue this right now. But trust me, you forking reddit will fuck up my week, and probably stall any future open source contribution to reddit.

(01:13:53 AM) ketralnis: Forking it will make that situation worse by losing the only developers *paid* to contribute to it from your fork, and any open source developers from either

So reddit corporate would not be happy to see a fork rise up, but what choice do users of reddit OSS have? Things are definitely not good the way they are now and I think that a fork is ultimately inevitable unless reddit revises their policies, allows some divergence, and finally takes the open-source side of things seriously.

Is there much interest in a fork out there? There’s lots of good contributions on github that remain unmerged, and a fork would be more active about merging these and especially merging changes that enhance the platform for smaller sites. Once someone gets reddit.com-level traffic, they can switch the platform to the official reddit OSS and then all of the onerous/tricky/annoying/monstrous stuff that is employed by reddit to allow caching and survival under that kind of traffic will be beneficial.

The paths before reddit.com/reddit corporate are A) take reddit OSS seriously, get patching and merging fixed up and make it easier to push out changes, and then maintain the open-source version frequently and well, including possible divergences where it benefits the OSS user; B) stay the course until someone forks, and its unclear what the ultimate consequences of this would be. ketralnis seems to think it would mean a secession of commercially-funded development entirely; or C) stay the course until everyone gives up on reddit OSS and the project withers and dies. What’ll it be?

Hosting shared folder from IIS 7 and VirtualBox

I have a Win7 guest running in VirtualBox. I’m working on a .NET project and got really sick of rebooting for Windows, so after some failed attempts to forward the ports for MSSQL and connect remotely, I have configured the guest to host my whole application while I develop and build in MonoDevelop.

I didn’t want to have to make a commit and push/pull every time I wanted to test, so I configured IIS 7 to use a shared folder from my host. However, conventional \\vboxsvr and VirtualBox shared folders do NOT work; IIS refuses to read the files, even after trying everything in the relevant Microsoft KB articles.

There is probably something wrong or incomplete in the VirtualBox implementation, because if you share the folder via Samba everything works swimmingly. I am using VirtualBox 3.2.8; if you are trying to use VBox’s shared folders to host a folder for IIS, stop now and set it up via Samba. This should solve any lingering difficulty unresolved by the Microsoft articles.

Once you have your share configured via Samba, just make sure that you configure IIS to “Connect as…” the user you’ve configured for Samba with smbpasswd and that you are using a UNC compatible path name (\\server\folder (in case of VirtualBox, this will usually be \\\folder)), not mapped drive letters like X: because mapped drive letters only exist for the users that mount them (i.e. your main user, not your IIS user).

You may get another security related error, which can be resolved by entering the .NET Framework Configuration Manager and enabling FullTrust for the correct Code Group (I just enabled it for LocalIntranet due to the inherently local nature of the VirtualBox setup on my development box).

This article may help if you are receiving the following errors:

  • “The requested page cannot be accessed because the related configuration data for the page is invalid”: 0x80070003, 0x80070005, etc.
  • Exception Details: System.Security.SecurityException: Security error. PublicKeyToken=b77a5c561934e089

NoSQL v. SQL is the worst holy war ever.

This seems to be filled with religious contention, as demonstrated at http://news.ycombinator.com/item?id=1163039 . Both sides are talking past each other, so I want to lay it out flat.

First of all, while NoSQL and RDBMS can sometimes exclude one another, they should not be seen as adversaries. NoSQL is designed to address a certain problem space and RDBMS is designed to address another. Both can be an important part of one infrastructure. So all of the resentment between sides is pointless.

Relational databases scale. NoSQL databases scale. Both are scalable and tunable, depending on the situation. Sometimes an RDBMS will be better for your project (yes, even in performance). Sometimes a NoSQL datastore will be better for your project.

NoSQL datastores like CouchDB and MongoDB are developed by competent developers. They are used by competent developers.

Relational and SQL-bearing databases like SQL Server and PostgreSQL are developed by competent developers. They are used by competent developers.

NoSQL offers a barebones solution for people whose primary concerns are speed and load. RDBMS offer a full-fledged solution for people whose primary concerns are data integrity and interrelatibility.

There might be a place in your organization for both!

There is no need to get haughty about this. Pick the design that works best for your problem set. There is no need for one to eliminate the other. Both are useful.

Good software development is all about good judgement. Anyone can learn syntax rules and throw together something that kind-of-sort-of works, but a good developer will know when to deploy one thing and when to deploy another. Keep your options open and stop the silliness.

NoSQL v. SQL is the worst holy war ever.

Joel’s “Duct-Tape Programmer” is the only programmer you should ever hire.

I’ve just finished reading Joel Spolsky’s “Duct Tape Programmers” and jwz’s response to it.

These posts strike a chord with me, especially that Spolsky spends the entirety of the article praising “The Duct-Tape Programmer” before he urges you not to try it.

I’ve worked on teams where every member urgently insisted on rigid, absolute application of certain tools, citing familiarity as the main benefit. “When all you have is a hammer, everything looks like a nail.”

These teams tend to be weeks behind schedule, blinded by various unique brands of naivety, and in general this type of management produces an unwieldy monolith of a codebase, slow and non-intuitive.

jwz, on the other hand, is a pragmatic generalist who doesn’t fear learning a new thing if it is the right tool for the job. While others who blindly insist on the same old are motivated by self-interest, fear, and good performance reviews from uninformed business people using worse-than-useless metrics, the “duct-tape programmer”, epitomized by jwz, is concerned with performance, utility, and maintainability of the product, which sadly, often comes at the expense of high corporate positions reserved only for elite sycophants, not practical, usable realists. jwz’s blogs testify to this clearly.

When Mr. Spolsky implores readers not to follow in jwz’s footsteps of judgment and application, he promotes a pervasive negativity and self-decay within a development team. Obsession with fads or hyperfocus on one area of good practice and other symptoms of programming elitism do no good to someone actually interested in developing or maintaining functional software. Often “non-duct-tape” programmers can prattle shallow quips right off to back up their incompetence, but the fact is that attempts to squeeze a round peg through a square hole are always inadequate, even if the big bad “learning curve” must make an appearance. Invest the time up front and do it in a way that loosely approximates an intelligent approach instead of allowing politics and insecurities to hinder development.

I can’t tell you how many hours I’ve seen wasted bastardizing functions deep within the netherparts of CakePHP (for instance) when a faster, better, and more suitable custom component could have been built much faster and maintained much easier. A programmer that does that is an example of a bad programmer because no matter how well he knows the target platform, he’s going to regularly misapply that knowledge, and his technical religion that CakePHP (for instance) is ultimate, and everything must be done the Cake way (which, ironically, is tarnished by forcing yourself to modify core libs because you can’t not use Cake for something), impedes the macro-level goal of “build something that works well, reasonably quickly”.

Spolsky’s description paints the duct-tape programmer as an ignoramus, a programmer who’d just as well stick to pre-1990 languages, methods, and conventions, but that’s clearly not the case with jwz, who was one of the first third-party developers for the Palm Pre. jwz, like all other good programmers, simply knows how and when to avoid cruft and how and when to leverage existing work. This sense of decent judgment is gained only through long experience and technical exploration, the kind a “non-duct-tape” programmer is too afraid to get.

So, when Joel says the “duct-tape programmer”, what he really means is the “pragmatic, profitable programmer, the only kind of programmer you should ever hire or use; a ‘good’ programmer”. These programmers do follow the changes in the field and they do play with new technologies and methods. They simply know when to experiment, when to complicate, when to simplify, and when to ship.

Do yourself a favor and pattern your next hire after jwz; get a programmer with the know-how and the wisdom, because one is useless without the other.