nptech, tagging

Our collective blindspot: nptech and del.icio.us

Before you read the rest of this article: take a look at my attempt to wrangle the nptech feed. It’s only the first step in a long list of steps that I think we need to take as a community towards resolving tagging issues.

I’ve started to read with interest how Michele Martin is trying to create another nptech resource — this time using Rollyo. Like myself, I think she made the mistake of building the search engine before setting up the parameters for successful operation. More on that later. After thinking about this issue for some time, I’ve come to believe that the nptech tag and del.icio.us is our community’s collective blindspot. I think we’ve been seduced by the notion that del.icio.us has an open API without considering the possibility that the supposed open API is actually quite limited and requires some programming skills in order to extract useful data.

As a programmer myself, I’m obsessed with finding ways for nonprogrammers (which comprise a large portion of the nptech community) to participate in tag parsing and tag analysis. It should be clear that we should provide tools for relative nptech novices to enter our tagstream as quickly as possible but at the same time provide for relatively sophisticated data analysis should that novice eventually decide to plunge further into the data. Unfortunately, there are too many reasons why del.icio.us is a not a proper resource for a progressive and open community such as ours.

  • The API doesn’t allow for automated extraction over the entire timeline of the tag. That is, we have to screen scrape in order to traverse all the possible tags in the timeline. If you take a look at the Perl code that is shown at unthinkingly.com, you’ll see some of the acrobatics that have to be performed to access the tags themselves. This prevents the entire nptech tag database from being easily accessed via outside services such as the much ballyhooed Yahoo! Pipes. Right now, there’s a del.icio.us bug that prevents something like a thousand tagged items from showing up. This is not good.
  • The API doesn’t allow for automated URL extraction from the del.icio.us Web site. This is somewhat the same problem as issue #1 but right now you can’t simply enter in something like http://del.icio.us/tags/nptech/sites to retrieve all the sites that have ever been give the nptech tag. This is a problem but could be easily surmounted with some screenscraping.

I suggest that those issues above will actually lead us to discerning what is important when we maintain a tag database for our community. I propose the following parameters for a “definitive” and “canonical” source of nonprofit technology tags.

  1. We should have a common easily accessible database of tagged sites and resources. As long as we use del.icio.us as one of our main tag databases, we will not be able to achieve this end. Basically, I want to traverse the entire database without using awkward programming workarounds that limit the ability of our peers who are not programmers from accessing or analyzing that data.
  2. Any API to the data source should allow us to extract the following pieces of information from each tag:
    1. URL of the resource
    2. tagger ID or username
    3. date tagged
    4. description
    5. title
    6. multiple tag list but with nptech as a the sole required tag

    All of that should be done WITHOUT screenscraping methods and just through an API method call alone.

  3. It should be just as easy and as well-supported as our current workflow. This means support for newly tagged resources going into the database and RSS feeds coming out of the database in the same manner as we are currently working. It would also allow for enhancements to our workflow such as filtering and the opportunity to edit the tag after the tag has been presented to the community.
  4. Tagged site lists should be easily converted into XML or OPML.
  5. We should have a way to provide for the re-editing of nptech tags and probably, a Digg-like interface to help us evaluate the more popular of these tags.

All the above means that the del.icio.us tag stream cannot and should not be the ultimate resource for finding about nptech tagged resources. The del.icio.us API is too brittle to be used in a transactional mode and workarounds are too high a barrier of entry. What we should consider is the following:

1. Pligg with an automated RSS feed – this gives us the ability to still use our current workflow (dumping things into Technorati or del.icio.us) but then rate all the existings items via the Pligg
2. Conversion of that RSS feed via a textparser (and no, it can’t be Yahoo! Pipes, that technology is too high-level) into what Google Custom Search Engine calls an augmented feed
3. Automated export of that feed into a Google CSE which then gives us a easily accessible database in which the site list can be exported via XML.

Unfortunately, even in the scenario above we’d have to do some work in order to make the Pligg give up tagged site info in point #2. However, it would only have to be done once as that could be written as an open API call as well.

Relevancy Ranking
Why did you post this???I do not think this was necessary.Not bad. I will save for later.I really needed to read this!This bit of knowledge will make me look good. (1 votes, average: 5.00 out of 5)
Loading ... Loading ...

25 Comments

  • On 02.17.07 Michele said:

    A great reason why non-programmers (such as myself) should leave the search engines up to the programmers. I know just enough to make me dangerous. :-)

    Seriously, though, I think you’re right that creating options for non-techies to dig into the data stream is important. In particular I think there’s an issue of helping practitioners to see how technology can be integrated into their work practices in a way that makes them more effective as workers. We’re not there, yet, obviously.

  • On 02.17.07 Allan Benamer said:

    Thanks, Michelle, for your kind comments. The point of our community is to spread the knowledge so it really behooves to build tools that while technical are at the lowest point of entry possible. For one thing, it acknowledges the way that nonprofits steadily accrue nonprofit expertise and the historical reasons as to how the nptech community formed.

    Del.icio.us really frustrates me and is starting to frustrate others and it’s really time for us to start considering other less proprietary ways of sharing our data. Del.icio.us is still usable as a point of entry but the data can’t stay there if we want to work with it. At least, the change will be gradual and nobody has to change their bookmarklets.

    http://www.nonprofittechblog.org/pligg is just one stab at a fix and if we do adopt a Pligg-like site for our tagging, I’d hope it ends up at the nptech.info site as this isn’t meant to be on my blog. It would speak of the same exclusivity we’re trying to fight. The good news is that the pligg product is open source so future nptech practitioners can add or modify the code. We can’t do that with del.icio.us.

    The Google CSE idea is there but it’ll take some time to write the necessary code to make it all work correctly. Someone has to do what Google calls augmentation of the RSS feed and that does take up a little bit of code but it will only have to be done once. As you can see, we’re looking at building an ecosystem that handles the addition, parsing and storage of tags.

  • On 02.17.07 Beth said:

    Allan,

    Thanks for doing the coding and eating your own dog food (sorry, couldn\’t resist)

    Okay, I have some questions. What feed is feeding the plig site? Or is it all user submissions?

    Can you visually diagram the flow?

  • On 02.17.07 Beth said:

    Okay, I did a diagram. Tell me where I’m not getting it in the general flow …. I wouldn’t know an API from a apricot ..
    http://beth.typepad.com/beths_blog/2007/02/allan_benamers_.html

  • On 02.17.07 Peter Campbell said:

    I concur wholeheartedly with your conclusion about del.icio.us, Allan – as addicted as we all are to it, it isn\’t set up to support the type of analysis and repackaging that we should be doing on teh nptech tagstream in order to make it a much better resource.

    Another weak point is in my stweardship of the nptech.info site – I\’m very well-meaninga nd completely time-strapped, so even when I have volunteers willing to help out, I get caught up in my insane workload and fail to capitalize on them. But there\’s some hope:

    First, David Geilhufe has offered to host the site over on CivicSpace\’s servers, which would plop it in the hands of a bunch of Drupal pros who all happen to be friends of ours with an interest in teh project. That should solve the numerous technical issues that I;ve been having with the site that have also contributed to delays in moving it along.

    Second, my career is about to change, probably dramatically, as my last day after six and a half years with SF Goodwill is on March 30, and my next gig is likely to be very different — of the opportunities I\’m chasing down, all of them would be working from home (at least to start), and some of them would possibly allow me to better focus on my far too numerous side projects.

    So we\’ll see where that goes. I\’m maintaining the list of volunteers, but I\’m also recruiting people (like everyone in this thread) as, at a minimum, clearly interested parties in the type of portal that nptech.info should be developed into.

  • On 02.17.07 abenamer said:

    Yes, nptech.info should be a Digg-like site for our RSS feeds and eventually should also host its own Google CSE (not Manseo although I can start you off with Manseo’s current site list). The “business” reason should be:

    We need to lower the barrier of entry to the nptech tag stream by allowing new nptech practitioners access to the history of the tag and it’s major practitioners (the Beths and Deborahs of our world) and to contributing to the tag itself by either contributing bookmarks, voting on new links or helping to add new subtags to each new link. We also need to let researchers access the tag history and manipulate its data.

    If David wants to host it, that’s fine. I just don’t want to be in a position where we end up with no action on the proposed workflow and the tag has suffered from a lot of project stutters as it is. I prefer a fairly robust hosting environment though, similar to the ISP I use currently for hosting, nexcess.net. They let me do web-based FTP and PUTTY sessions. At the very least, we should have that capability on David’s server.

    I believe that if we do it right, a future nptech.info site should be able to make money via Google Adsense as search-directed ad clicks have high ROI for everyone involved. I see no reason why nptech.info not be self-funding in that respect. It doesn’t have to be on a volunteered server at all.

    At this point, it’s not the cost of the solution that bothers me but whether or not volunteers have the bandwidth to give any proposed new RSS workflow the time and work it needs. Setting up the Pligg was moderately difficult – if you don’t know your server environment, you’re gonna cry. Setting up the Google Subscribed Link feed is going to be a matter of pulling down the RSS feed and then performing an XSL transform or doing some REGEX stuff in the language of your choice. And of course, there will be the continual upgrades to the Pligg installation and any other server upgrades. In some respects, this is why I prefer a commercial server, I don’t think volunteers have time for all the cruft that server maintenance entails.

  • On 02.17.07 Peter Campbell said:

    Interesting. And a good vision. We need to bring David in on this conversation – he has both volunteered to help build the site and to host it, and I\’m reasonably sure that we\’d have a lot of access on his servers. But I\’ll caution that we can not commit to this type of development without committed administrators, including a few with skill sets with ours (while I haven\’t played with Pligg, I\’ve played with all sorts of other stuff, and I\’ve been using *nux/regex and other tools for about 20 years, working my way through perl and PHP and on to Ruby — I\’m betting we have a lot in common, Allan).

    So here\’s a question: should we be meeting on this at NTen? I\’m slated to do a Salesforce thing that I\’m completely unprepared for, and I bet that Holly would let me last-minute replace it with an NPTech session, if there isn\’t one already in place. But I\’m thinking of one that does revolve around the ideas and feasibility of growing the web site. Is that worth pursuing?

  • On 02.18.07 Beth said:

    Peter: I want to attend that session. Make sure it isn’t scheduled at the same time as other sessions I’m doing. With that said, what about a NPTECH affinity group meeting?

    Peter, did I send you the info about the CPsquared conversation?

    Allan, can you blog a bit more about the user role thing?

  • On 02.18.07 David Geilhufe said:

    The big issue here is time commitment. I want to work on it an hr a week or so, but its easy for me to get overwhelmed. nptech.info has a couple issues:
    (1) Do we have enough administrators? 3-4 reliable people are required.
    (2) Do we want to do custom stuff? If so we need to use a dedicated install… CSOD is useful for all the standard Drupal capability, but custom code means we now need a dedicated site and system admin.

  • On 02.18.07 Beth said:

    David, if someone can teach me, I can help. I’ll also document so others can help. But can’t do much until after NTC.

  • On 02.19.07 Marnie Webb said:

    Right on, Allan. I\’m with you that del.icio.us isn\’t robust enough to meet the needs you are talking about. But people aren\’t just using del.icio.us. They are using Furl, magnolia, upcoming, microformat tags in their own blog posts, flickr, YouTube and, I\’m sure, other things.

    So, I think we have to think about the different ways that people seem to be using/want to be using the information and then set up complementary ways to access it:

    * search engine. Maybe instead of based on suggested topic areas a search engine could be based on searching through bookmarks, blog posts, photos, videos, events?
    * a taxonomy. This is where del.icio.us really fails but putting the nptech tag into a database of some kind so that you can end up w/ something like this: http://demo.siderean.com/facetious/facetious.jsp?tn=0subject&tv=nptech&ss=1
    * a tag cloud. This could really help the \

  • On 02.21.07 abenamer said:

    As a sidenote, all those other sources besides del.icio.us can be merged into a new NPTECH metafeed via Yahoo Pipes…

    I don’t mind Yahoo! Pipes too much — it’s fairly innocuous as RSS merging technology goes. I think the community will have to create a Yahoo account and just merge all those other feeds together.

    The Google CSE can only really search URLs — theoretically, it wouldn’t be impossible to search through bookmarks but the bookmarks would have to be in OPML. It IS possible to search images but they’d have to be considered a URL resource that was set as a permalink or we’d never ever find it again.

    I like the facetious demo and in fact, Google CSE actually deals with the issue of multiple audiences. If you go to http://www.nonprofittechblog.org/manseo you can see a demo of that. A true taxonomy? No, not really. This is the price we have to pay for having a distributed audience with very loose tagging authority (or none at all). I don’t mind that price as long as people feel free to contribute. It’s our job though to untangle the tags by exposing them to another process via the Pligg site.

  • On 02.27.07 Ben Sheldon said:

    It seems like a very sophisticated technology process that is actually pretty impressive all told. But it also seems like a pretty big sledgehammer for something that I didn’t realize was a large problem. Which isn’t to say that how the nonprofit tech community disseminates information couldn’t be improved but this seems really huge for what I see as a very small (active) community.

    I wonder with Pligg, what the signal-to-noise ratio would be–the worry being if only 100 people are participating, and those 100 people are all pretty closely aligned (as I see the nptech community is overall) is it going to just turn into a giant echo chamber, with everything “digged” (pligged?) about equally?

    Most importantly (and I admit that I do primary grassroots tech), could any of this be done using existing tools and word of mouth? I’m a geek and tend to think of things as technological solutions, but I feel like if a major goal is to come up with the Uber Nonprofit Technology Library and Water Cooler, that’s going to happen through word of mouth, not SEO.

    Lastly, and importantly, how is this going to affect/supplant bloggers like you? Because the community is small, I trust individual voices and for the most part y’all have been doing an excellent job of pointing me to what I want or need or didn’t know about (and giving commentary, which Digg/Pligg doesn’t do well). With Pligg, I feel like at best it’s going to be tracking back to the bloggers sites themselves, and considering right now I think I am subscribed to only about 5 nptech blogs (because that’s all I’ve found/found worth reading), that’s not much of an impetus to use it.

  • On 02.27.07 abenamer said:

    Agreed. It’s a big set of tools but it is a big problem. I’m always looking to automate everything because no one has the time to do things manually. I hate to see the nptech.info site lose readership due to admin inactivity and this is a great reason to finally do something about it. Admittedly, all MY reasons are about creating an archive that I can search via Google CSE. Everyone has other agenda though so I think this set of tools is the only one that can accomodate both those different workflows and my needs.

    I don’t think it really supplants bloggers. In fact, I wouldn’t be surprised if no one REALLY used the Pligg site at all. However, the feed from the Pligg back into the Google CSE will be absolutely useful for what I want to do. Again, the idea is to keep the infrastructure open so that everyone’s needs are met and not to foreclose on future opportunities.

  • On 02.27.07 Peter Campbell said:

    Just to clarify, here, admin inactivity is the exact opposite of the problem. :-) It’s February 27th. after NTC, in early April, I’ll be in a new job situation that, with luck, will not be the 70 hour a week ordeal that I am still under with Goodwill. At that time, I will be able to assess whether I can properly manage nptech.info or not. I’ll be very realistic – if it isn’t going to happen, I will let all interested parties know. Hosting isn’t an issue – there are plenty of people willing to house it, but developing and administering it is.

  • On 02.28.07 abenamer said:

    Well, that sounds pretty good. The problem I’ve found with the admin tasks associated with a server is that it’s a lot of work and generally unrewarding. You’re not actually doing anything cool or sexy, you’re just chopping wood. As a result, my general feeling is that I hate to impose that kind of work on anyone else. That’s why I’d rather we as a community outsource that work to people who are paid to do it. It’s more fun setting up new apps and new ways for those apps to interact then it is to look at disk usage stats and worrying about whether or not to get another drive.

    This is why I still think a hosted solution is better. We can then get down to the development work that I’m sure people are willing to do. For my part, I’d rather just set up the pligg, set up the enhanced RSS feed and just manage the pligg and Google CSE. I don’t know what people actually plan to do with the Drupal side of things but it seems that it might be better if we all started eating our own dog food and start to plan out what we WANT from nptech.info. I’ve already stated what I think it should look like but other people should chime in.

    That way, Peter can more easily assess whether or not his new responsibilities are more in line with nptech.info.

  • On 07.05.10 Roundup for February 2007 « Nonprofit Blog Exchange said:

    [...] of the Non-Profit Tech Blog shares his thoughts about nptech and del.icio.us [...]

  • On 02.23.13 otcerecon said:

    yay!

  • On 09.25.14 Pharaoh s Way Slots said:

    Pharaoh s Way Slots…

    Our collective blindspot: nptech and del.icio.us | Non-Profit Tech Blog…

  • On 10.04.14 clash of lords 2 cheats ios said:

    clash of lords 2 cheats ios…

    Our collective blindspot: nptech and del.icio.us | Non-Profit Tech Blog…

  • On 10.12.14 psychopath test a journey through the madness industry said:

    psychopath test a journey through the madness industry…

    Our collective blindspot: nptech and del.icio.us | Non-Profit Tech Blog…

  • On 10.18.14 clash of lords 2 cheats for rings said:

    clash of lords 2 cheats for rings…

    Our collective blindspot: nptech and del.icio.us | Non-Profit Tech Blog…

  • On 04.01.17 Vitalure Anti Aging Serum said:

    I and also my guys happened to be digesting the best strategies located on your site while quickly came up with an awful suspicion I never expressed respect to the web blog

    owner for those secrets. Most of the people happened to be absolutely

    thrilled to read all of them and have now honestly been enjoying these things.

    Appreciation for being so accommodating and then for picking some great ideas millions of individuals are really desperate to be informed on. My personal honest regret for

    not expressing gratitude to you sooner.

  • On 04.29.17 Extreme Exo Test said:

    Precisely what I was looking for, thanks for posting.

  • On 05.12.17 eroticalabseroticalabsmen sexual healthmen's sex enhancementswomen sexual healthwomen's sex enhanc said:

    Just desire to say your article is as astounding. The clearness in your post is simply excellent and i could

    assume you’re an expert on this subject. Well with your permission let me

    to grab your RSS feed to keep up to date with forthcoming post.

    Thanks a million and please continue the rewarding work.

speak up

Add your comment below, or trackback from your own site.

Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*Required Fields