Dave’s earlier posts sparked some good conversation about tagging. Here is my proposal for how tagging could work on the new version of the site. This proposal isn’t necessarily what we will do; I’m putting it out there to get feedback from the community about whether it’s the right approach.
First, an overview. There are two ways to approach tagging:
- Folksonomy: all the users use their own tagging schemes. There are tools to let users discover tags already in use.
- Ontology: the owners of the site describe exactly what tags people can use, and expect people to use them.
Our goals are also twofold:
- To help readers of science blogs more easily find the content they are looking for, and
- To do so without imposing constraints on the authors of science blogs
I believe that folksonomies are the best solution to the above dilemma: they impose no constraints on authors; and, if things are done right, hopefully many of the tags will start to come together. My suspicion is that if we specified a strict list of tags, users would not want to use them.
But how to make the folksonomy chaos into something useful? We will maintain adatabase of tags. Each tag’s entry in the database will have (at a minimum — this can be expanded later):
- Name of tag (e.g., “tamarin”)
- List of synonymous tags (“Saguinus”, maybe “tamarind” if we want to support common mistakes)
- List of children tags (“cotton top tamarind”, “cotton top”, “Saguinus oedipus”, etc — may be very long)
- List of parent tags (“New World monkeys” — may be multiple)
Bloggers may tag with any of the synonymous tags. Let’s say we do decide to support mistakes. Someone may tag “tamarin” or “tamarind”. Those are different tags, but our system understands that they are synonymous.
Someone searches for “tamarin.” They get a list of posts tagged with either “tamarin” or any of the synonymous tags (so “tamarind” or “Saguinus”).
So what are some problems which might arise?
What if one tag is used for two entirely separate things?
A physics blogger uses “charm” to describe a kind of quark. An anthropologist uses “charm” to describe something used medicinally by a tribe of primitive people. A user searching for “charm” will get both.
I submit that this isn’t a huge problem. It isn’t going to happen all that often. When it does, in almost all cases, the user will be able to refine their search to say “I am only interested in ‘charm’ tags used on blogs with a ‘physics’ theme.” It will be annoying to the people who want to see what the parent/children tags are for “charm,” because they’ll get a weird mix of physics and anthropology subjects. But I think it is not going to happen often enough to really be annoying (and it is better than the alternative of trying too hard to control things).
Sounds like a lot of work to input parent/children/synonym relationships!
Yes. We will have to start with no relationships at all — just a big flat list of tags. Eventually, each subject area will have one or more curators who help manage it. Part of their jobs may be to input relationships for tags in their areas. We will have to make a user interface to make this very easy. Perhaps we will build a user interface to allow users to suggest the addition of new relationships, as well.
The point is that we can do this very gradually. The system will start working immediately, and then be improved with time.
What about brand new tags (“pepsi-gate” vs “pepsigate”)? How can curators possibly keep up with that?
In that case, I believe that the crowd will start to converge, if a) we provide incentives to use the same tags — “if you use the most popular tags, your post will be more discoverable and you’ll get more readers” — and b) we make it very easy for bloggers to find out what the relevant tags are.
Of course, we will provide a list of available tags, organized for readability once we have parent/child relationships. Additionally, we will need a tool to provide tagging suggestions to bloggers while they are writing blog posts. Again, that can be something to do a little ways down the road.
We can also provide a page on the site which offers lists of the currently most popular tags, maybe even the most popular new tags. If it’s clear to someone that they are about to browse “pepsigate” posts, then if they want to write a followup, they are likely to remember that that’s the tag they are responding to, and tag their post appropriately.
Won’t this list of tags become so long that any tool which auto-suggests tags to users will become too slow to use?
This problem can be at least partly alleviated by letting users specify that they are only interested in tag suggestions from particular categories. Once parent/child relationships are in place in the tag database, tag suggestions can be filtered that way. We can also learn from other tools that offer auto-complete over large spaces to see how they solve this problem.
Have folksonomies been successfully used in the past? What are good examples?
Obviously, Flickr is the best example of a site which has completely user-generated tagging. Their mission is somewhat different from ours, though! Do you have examples of folksonomies that work or that have failed?
This post is intended to start discussion, so please, weigh in! What do you think about this approach to handling the huge number and variety of tags in use on science blogs? Is it clear, and do you have questions?
Aggregating blogs is not technically difficult. (See previous posts: How to create an aggregated feed and Feed aggregator choices.) Finding other people to aggregate with can be a challenge, however. Feel free to comment here in order to find other people to aggregate with. One way to start is to suggest a topic (neuroscience? medicine? new bloggers? meta-science blogging? students? faculty? physics? astronomy? anthropology?) that you blog about, and ask if others want to aggregate posts on that topic.
Remember, if you are an independent blogger who wants to be listed on scienceblogging.org, putting together an aggregated feed is currently the only way.
Also, remember that you can choose to aggregate only selected posts if you want, using tags as filters. Or you could aggregate your entire blog.
There are a bunch of feed aggregators out there. However, I haven’t used any of them, so I don’t know which ones work well and which ones work poorly. I encourage people to try them and comment here with their experiences. I can edit this post with useful information from the comments.
Below is a list of web services which will allow you to set up a feed aggregation.
- Yahoo Pipes. Widely used. Pipes has a blog where you can learn more about it. Some people seem to find it hard to use. Others complain that it edits the RSS feeds that it passes along (for example, changing whether the links open a new window or not). The Scienceblogs Diaspora Feed runs on Pipes; you can clone and edit that feed (this may be a good way to get started if you find Pipes confusing). There is a review of Pipes which might be interesting.
- FriendFeed. Another widely used one, and seems to be a better bet than Pipes (but comment here and say why or why not!). FieldOfScience uses this one. Commenter Edward says: “FWIW, I’ve done the grunt work with the Yahoo Pipes. You’ll need a Yahoo account, but once you have one you can simply Clone this pipe: http://pipes.yahoo.com/fieldofscience/full. With your Clone, go to Edit Source, then change the feeds in the Feed Fetch module to yours, and in the Simple Math module put the number of feeds you are combining.”
- XFruits. Seems to be very feature-rich. I don’t know of anyone who is using it.
- FeedKiller. Looks very easy to use; doesn’t require a login; which means, I think, that the feed won’t be editable later; puts a feedkiller ad on each post.
- Feed Informer. Also looks really easy to use.
If you have access to a web server and are able to set up a software package on it, you can run your own feed aggregator. Benefits: no ads inserted into the feed; you are in charge of the server and whether it is stable. Down side: you have to have some knowledge and a server.
- Feedburner. Once you have a feed, you can use Feedburner to make a new URL for it. This can be nice a) because Feedburner provides usage statistics, and b) in case your new aggregated feed has an ugly URL.
- Feed Rinse. Filters feeds for you: “You can rinse your feeds by keyword, author, tag, etc, or filter profanity and more.”
Did I miss anything? I’m sure I did, but I’m happy to add more if you let me know what I left out.
Hi. You don’t know me, but I’m here to try to help out with some of the technical aspects of science blog aggregation. I’m going to start by writing about how some bloggers might get together to set up a blog aggregation.
So: you are an independent blogger, and you want to aggregate your blog with some friends’ blogs, and then you want scienceblogging.org to aggregate that aggregation. What does that mean, and how do you go about it?
The first step is to find a group of people who blog about similar topics at least some of the time.
You sometimes post about cognition, sometimes meta thoughts about science blogging, and sometimes personal ramblings, at ramblingscienceblogger.blogspot.com. Your friend Jane writes about neuroanatomy, addiction, and her young daughter at janesaddictionneuro.blogspot.com. Your friend Bob writes about behavior, neurotransmitters, and his dogs at bobsbehavior.wordpress.com. The three of you would like to create a “Brain and Behavior” aggregated feed. You agree that any posts in any of your three blogs tagged “neuroanatomy” or “behavior” should be included in the new aggregated feed. Posts that don’t have either of these tags won’t be included, although posts that have one or both of these tags and some other tags will be included.
For example, when Jane writes a post about the hippocampus, she tags it “hippocampus” and “neuroanatomy.” This post will be included in the aggregated feed. When she writes a post about her daughter, she tags it “kidblogging” only. It will not be included in the aggregated feed. Bob writes a post about a funny thing his dog did yesterday and tags it “ginger”; it is not included. Then Bob writes a post about how his dog’s behavior during thunderstorms reminds him of a recent article he read about fear conditioning in rats, and he might tag that “ginger” and “behavior.” That post would be included.
My examples assume that all the bloggers in this group are independent bloggers, but of course they could just as easily be bloggers on a network of some sort.
You choose an aggregator service to manage your new aggregated feed. You tell this aggregator service to aggregate the following feeds:
http://janesaddictionneuro.blogspot.com/feeds/posts/default/-/neuroanatomy http://janesaddictionneuro.blogspot.com/feeds/posts/default/-/behavior http://bobsbehavior.wordpress.com/tag/neuroanatomy/feed/ http://bobsbehavior.wordpress.com/tag/behavior/feed/ http://ramblingscienceblogger.blogspot.com/feeds/posts/default/-/neuroanatomy http://ramblingscienceblogger.blogspot.com/feeds/posts/default/-/behavior
In other words, you are telling the aggregator service to pull in RSS feeds for Jane’s, Bob’s, and your blogs, but only the posts with the tags that you care about. You must include a separate URL for each tag and each feed — so for two blogs and two tags, you include four URLs. For three blogs and two tags, you include six URLs, and so on. The patterns demonstrated above will work for all blogspot and wordpress blogs.
The aggregator service then provides you with a new RSS feed which contains all the posts from Jane’s, Bob’s, and your blogs tagged “neuroanatomy” or “behavior.” You publicize that RSS feed however you want — you may just blog about it, or you may create a web page as a home site for it. You definitely let scienceblogging.org know to aggregate it (firstname.lastname@example.org).
If you later decide that you want to aggregate your meta science blogging thoughts with some other people who also like to write about science blogging in general, there’s nothing to stop you from having a second aggregated feed on an entirely different topic, with entirely different people, using your same blog. Just use different tags. In fact, one post could show up in both aggregated feeds, if it used the right tags.
What is a tag?
Tagging is a way of noting the subject matter of a particular blog entry. Most blogging platforms will provide a way for you to tag each blog entry.
What tags should I use?
That’s for you and your co-bloggers to decide. It would be nice if bloggers started coming to a consensus on tag names for particular topics, so that aggregating different blogs by tag was easier. We’ll see if this happens.
What if my blog is not a Blogspot blog or a WordPress blog? How do I find out what the right format is for a feed for a particular tag?
Try doing a web search for “tag RSS” and the name of your blogging platform. Or comment here and I will try to help you figure it out.
What’s the point of this, anyway?
Subject aggregations are convenient for readers — a way to get an overview of blog posts by topic, rather than by author. They are also a way to build community, with several different authors working together to generate related content.
What aggregator services are out there and which are the best ones to use?
There seem to be several to choose from, but I don’t have experience with them to know which are better. If you have found particular ones that you like or don’t like, comment here and let people know. Yahoo Pipes seems to be widely used, but there are others.