Harry Marr

Projects

About

Recent Entries

Archive

Tag Cloud

RSS/Atom

Notes from a production MongoDB deployment

1 week, 1 day ago — 0 Comments — Permalink

  • mongodb
  • nosql
  • scale

Really interesting post about how Boxed Ice handled some of the issues that appeared when using MongoDB for storing massive datasets (17,810 collections, 43,175 indexes and 664,158,090 documents).

Notes on MongoDB

1 week, 4 days ago — 0 Comments — Permalink

  • mongodb
  • nosql

Nice overview of MongoDB’s capabilities.

Integrating MongoDB and Django

3 weeks, 4 days ago — 0 Comments — Permalink

  • mongoengine
  • mongodb
  • mumblr
  • django

Check out this great introduction to MongoEngine and Mumblr from Kevin Fricovsky.

Introducing MongoEngine

1 month ago — 8 Comments — Permalink

  • mongodb
  • mongoengine
  • nosql
  • python

MongoEngine is a Document-Object Mapper (think ORM, but for document databases) for working with MongoDB from Python. It uses a simple declarative API, similar to that of the Django ORM.

So what does it do?

Here’s a brief run-down of some of the main features of MongoEngine:

  • Document schema declaration and validation
  • An elegant querying syntax, similar to that of Django
  • Document inheritance, with support for “polymorphic querying”
  • Aggregation methods, such as sum and average
  • Advanced query condition combination using Q objects
  • Session and authentication backends for Django

Show me the code!

To define a document, just inherit from the Document class and add some fields:

class BlogPost(Document):
    title = StringField(required=True)
    slug = StringField(required=True, max_length=250)
    content = StringField(required=True)
    date = DateTimeField(default=datetime.now, required=True)
    tags = ListField(StringField())

To save documents to the database, just instantiate a Document object, fill in the fields, and call save:

post = BlogPost(title='Introducing MongoEngine', slug='introducing-mongoengine')
post.content = 'MongoEngine is a Document-Object Mapper...'
post.tags = ['mongodb', 'mongoengine']
post.save()

To find documents, use the objects attribute of a Document subclass:

latest_posts = BlogPost.objects.order_by('-date')[:25]
mongodb_posts = BlogPost.objects(tags='mongodb')

How about a tag cloud? Simple:

# Get a dictionary with tags as the keys and frequencies as the values
tag_freqs = BlogPost.objects.item_frequencies('tag')

Every blog need comments, right?

class Comment(EmbeddedDocument):
    author = StringField()
    content = StringField(required=True)
    date = DateTimeField()

# Modify the previously defined BlogPost document
class BlogPost(Document):
    ...
    comments = ListField(EmbeddedDocumentField(Comment))
    ...

# Let's add a comment, this is performed as an atomic operation
comment = Comment(author=form['author'], content=form['content'])
BlogPost.objects(id=post_id).update(push__comments=comment)

I could go on, but I’ll keep this post short and to the point. For more information, see the documentation. The source is available on GitHub, fork it and have a play!

Insightful introduction to V8

1 month, 2 weeks ago — 0 Comments — Permalink

  • v8
  • javascript

An interesting, albeit slightly old, video explanation of V8’s use of hidden classes from the VM wizard, Lars Bak.

A successful Git branching model

1 month, 2 weeks ago — 0 Comments — Permalink

  • git
  • development

Great article describing a solid Git workflow. It suggests doing all development in a separate develop branch, keeping master only for production-ready code. The develop branch is merged back in to master when it gets to a stable state, anything that gets merged in to master is tagged as a release.

Three main other types of branch are used:

  • “Feature branches” are used to develop individual features for upcoming or distant releases; these branch off and merge back into develop.
  • For preparing releases (last-minute bug fixes, etc) a release branch is used - from this point on, no major features will be added, and the develop branch will be used for development on the next release.
  • If an existing release needs an urgent fix, a hotfix branch will be created from the last tag on master. When the bug is fixed, this branch will be merged back in to master, and the new release will be tagged.

Making Virtualenv Play Nice with Git

1 month, 2 weeks ago — 5 Comments — Permalink

  • git
  • virtualenv
  • bash
  • python

I like to do most of my Python development inside virtualenvs. I also create a Git repository for any project that matters or that will have any kind of continued development. Constantly switching between the different virtualenvs to work on different projects used to be tedious, but this issue was largely solved by the fantastic virtualenvwrapper.

Virtualenvwrapper has certainly improved the situation, but even so, I can’t help but worry that the cd project-x, workon project-x, (do some work), cd .., deactivate work-flow is going to lead me to an early grave caused by a severe case of RSI. So in order to retain my good health, I’ve hacked together a bash function that automatically activates a virtualenv when you cd into a Git repository, and deactivates it when you leave the repository.

By default, it assumes that the virtualenv’s name will be the same as the repository’s name, but this can be overridden by creating a file called .venv in the repository’s root directory with the name of another virtualenv in it.

# Automatically activate Git projects' virtual environments based on the
# directory name of the project. Virtual environment name can be overridden
# by placing a .venv file in the project root with a virtualenv name in it
function workon_cwd {
    # Check that this is a Git repo
    GIT_DIR=`git rev-parse --git-dir 2> /dev/null`
    if [ $? == 0 ]; then
        # Find the repo root and check for virtualenv name override
        GIT_DIR=`\cd $GIT_DIR; pwd`
        PROJECT_ROOT=`dirname "$GIT_DIR"`
        ENV_NAME=`basename "$PROJECT_ROOT"`
        if [ -f "$PROJECT_ROOT/.venv" ]; then
            ENV_NAME=`cat "$PROJECT_ROOT/.venv"`
        fi
        # Activate the environment only if it is not already active
        if [ "$VIRTUAL_ENV" != "$WORKON_HOME/$ENV_NAME" ]; then
            if [ -e "$WORKON_HOME/$ENV_NAME/bin/activate" ]; then
                workon "$ENV_NAME" && export CD_VIRTUAL_ENV="$ENV_NAME"
            fi
        fi
    elif [ $CD_VIRTUAL_ENV ]; then
        # We've just left the repo, deactivate the environment
        # Note: this only happens if the virtualenv was activated automatically
        deactivate && unset CD_VIRTUAL_ENV
    fi
}

# New cd function that does the virtualenv magic
function venv_cd {
    cd "$@" && workon_cwd
}

alias cd="venv_cd"

Note: for this to work you will need virtualenv and virtualenvwrapper installed. To use it, just stick it in your .bashrc somewhere below where your $WORKON_HOME is specified.

« NewerOlder »

Log in

Mumblr is a basic Django tumblelog application that uses MongoDB with MongoEngine. Fork it on Github. Designed and developed by Harry Marr and Steve Challis.