Really interesting post about how Boxed Ice handled some of the issues that appeared when using MongoDB for storing massive datasets (17,810 collections, 43,175 indexes and 664,158,090 documents).
MongoEngine is a Document-Object Mapper (think ORM, but for document databases) for working with MongoDB from Python. It uses a simple declarative API, similar to that of the Django ORM.
So what does it do?
Here’s a brief run-down of some of the main features of MongoEngine:
Document schema declaration and validation
An elegant querying syntax, similar to that of Django
Document inheritance, with support for “polymorphic querying”
Aggregation methods, such as sum and average
Advanced query condition combination using Q objects
Session and authentication backends for Django
Show me the code!
To define a document, just inherit from the Document class and add some fields:
To save documents to the database, just instantiate a Document object, fill in the fields, and call save:
post=BlogPost(title='Introducing MongoEngine',slug='introducing-mongoengine')post.content='MongoEngine is a Document-Object Mapper...'post.tags=['mongodb','mongoengine']post.save()
To find documents, use the objects attribute of a Document subclass:
# Get a dictionary with tags as the keys and frequencies as the valuestag_freqs=BlogPost.objects.item_frequencies('tag')
Every blog need comments, right?
classComment(EmbeddedDocument):author=StringField()content=StringField(required=True)date=DateTimeField()# Modify the previously defined BlogPost documentclassBlogPost(Document):...comments=ListField(EmbeddedDocumentField(Comment))...# Let's add a comment, this is performed as an atomic operationcomment=Comment(author=form['author'],content=form['content'])BlogPost.objects(id=post_id).update(push__comments=comment)
I could go on, but I’ll keep this post short and to the point. For more information, see the documentation. The source is available on GitHub, fork it and have a play!
Great article describing a solid Git workflow. It suggests doing all development in a separate develop branch, keeping master only for production-ready code. The develop branch is merged back in to master when it gets to a stable state, anything that gets merged in to master is tagged as a release.
Three main other types of branch are used:
“Feature branches” are used to develop individual features for upcoming or distant releases; these branch off and merge back into develop.
For preparing releases (last-minute bug fixes, etc) a release branch is used - from this point on, no major features will be added, and the develop branch will be used for development on the next release.
If an existing release needs an urgent fix, a hotfix branch will be created from the last tag on master. When the bug is fixed, this branch will be merged back in to master, and the new release will be tagged.
I like to do most of my Python development inside virtualenvs. I also create a Git repository for any project that matters or that will have any kind of continued development. Constantly switching between the different virtualenvs to work on different projects used to be tedious, but this issue was largely solved by the fantastic virtualenvwrapper.
Virtualenvwrapper has certainly improved the situation, but even so, I can’t help but worry that the cd project-x, workon project-x, (do some work), cd .., deactivate work-flow is going to lead me to an early grave caused by a severe case of RSI. So in order to retain my good health, I’ve hacked together a bash function that automatically activates a virtualenv when you cd into a Git repository, and deactivates it when you leave the repository.
By default, it assumes that the virtualenv’s name will be the same as the repository’s name, but this can be overridden by creating a file called .venv in the repository’s root directory with the name of another virtualenv in it.
# Automatically activate Git projects' virtual environments based on the# directory name of the project. Virtual environment name can be overridden# by placing a .venv file in the project root with a virtualenv name in itfunction workon_cwd {# Check that this is a Git repoGIT_DIR=`git rev-parse --git-dir 2> /dev/null`if[$?== 0 ]; then# Find the repo root and check for virtualenv name overrideGIT_DIR=`\cd $GIT_DIR; pwd`PROJECT_ROOT=`dirname "$GIT_DIR"`ENV_NAME=`basename "$PROJECT_ROOT"`if[ -f "$PROJECT_ROOT/.venv"]; thenENV_NAME=`cat "$PROJECT_ROOT/.venv"`fi# Activate the environment only if it is not already activeif["$VIRTUAL_ENV" !="$WORKON_HOME/$ENV_NAME"]; then if[ -e "$WORKON_HOME/$ENV_NAME/bin/activate"]; thenworkon "$ENV_NAME"&&export CD_VIRTUAL_ENV="$ENV_NAME"fi fi elif[$CD_VIRTUAL_ENV]; then# We've just left the repo, deactivate the environment# Note: this only happens if the virtualenv was activated automatically
deactivate &&unset CD_VIRTUAL_ENV
fi}# New cd function that does the virtualenv magicfunction venv_cd {cd"$@"&& workon_cwd
}alias cd="venv_cd"
Note: for this to work you will need virtualenv and virtualenvwrapper installed. To use it, just stick it in your .bashrc somewhere below where your $WORKON_HOME is specified.