This idea is active.
Functionality / design requirements »

Create an API

Create an API to allow developers to search and access resources. The Sunlight Foundation helps government agencies open up their data and create APIs so developers can use the data in applications. They could be a great partner in your work.


Submitted by scott 3 years ago

Vote Activity Show

Comments (5)

  1. Moderator

    Thanks - this is exactly what we're intending to do. But rather than create an API we're trying to develop a system that uses existing API's/standard interfaces to permit access to these resources. And we think if we use the right combination of API's and technologies we can enable this kind of access to resources not just for federal resources but for other organizations who have materials to share (as well as information on materials that others are sharing).

    Thanks for the tip about Sunlight, I hadn't considered them in this way.

    3 years ago
  2. scott Idea Submitter

    Thanks for the comment. What existing API's/standard interfaces are you thinking of using. From my research, there aren't any that would work.

    Also, as a developer, I'd far, far, far rather you employ a restful API than shoehorn your content into an API that poorly fits your content. Facebook, twitter, nytimes, all have different APIs--that's not a problem. What they share is their APIs are all quite good and exchange data in a standard format (JSON or XML).

    In your work, I hope that you write a custom API that uses standard data formats (JSON/XML). I wouldn't worry about creating a new API. I'd worry about creating it well.

    3 years ago
  3. Moderator

    Thanks. I'm trying to hold the project to a "one day doctrine" on API's which means a sophisticated web programmer (sounds like you're one - I think I qualify too) can study the API spec in 2 hours and in the next 6 hours of the same day can build a working, simple prototype using it. If it takes longer than that, it's useless. This excludes a lot of SOAP, CORBA, WSDL type stuff which IMO is a good thing.

    Right now the we're looking in to two directions:

    To submit "stuff" into the registry network, proposing SWORD. To pull stuff back out, proposing "OAI-PMH." That said, we don't want to lock everyone into only these access strategies but they are widely deployed in the resource/repository world as best as we can tell.

    For implementation, the two front runners at the moment are CouchDB and Amazon CloudFront. Since these are both basic API's, so long as you understand the policy rules for LR, you could read/write into the network using any protocol you want that is compatible with them. Couch uses JSON as a primary storage mechanism which is very appealing. CloudFront has the appeal of very cheap infrastructure that someone else keeps running for you. You can run Couch on CouchOne or on Amazon which is similar but almost certainly CloudFront is going to be more reliable, if possibly less flexible..

    We are developing an "envelope" concept where any content submitted to the registry network would be wrapped in some data envelope. I'm advocating that we use JSON for that, rather than XML since JSON is so much simpler to program against -- you can load it directly into programmatic vars rather than parse it like xml - seems to heavy weight to me.

    You should join the public listserv where these topics are discussed: https://groups.google.com/group/learningregistry

    All input is welcome there. (We moderate but only for spam in case you wonder why your first message is slow to post).

    3 years ago
  4. scott Idea Submitter

    Hi Steve,

    I'm glad to hear there's an experienced web programer at the helm who understand the details. I think this bodes well for the project.

    CloudFront looks great. I've heard good things about CouchDB as well. Also, I far prefer JSON over XML...I assumed that XML would be a requirement because its heavy and cumbersome and this is a product of the government after all...

    I think one of the things missing from this conversation is examples. On the discussion group and on the website, it seems there's an assumption that everyone knows what things would be stored on the repository.

    I have no idea.

    I mean, I can guess. Sure. But I don't know the details and since the details are the product, I think it would be very, very helpful to have half a dozen examples of resources that are scattered among government agencies and will be able to be consolidated.

    This is why I think examples are important: the tendency to over engineer things is massive and easy to do when designing systems for abstract, vaguely understood problems. Clear, defined problems lead to simple appropriate solutions.

    Here are some of my specific questions that would likely be answered by examples:

    1) Are the assets residing on the agencies servers or on the LR?

    2) What do these assets look like? Are they videos, pdfs, proprietary formats? 3) Are there multiple assets per 'document'? If so, what is the average number? What is the upper bound?

    4) What meta-data is currently attached to these documents?

    5) What are the specific use cases someone would search against? I'm strongly against blindly putting faith in tags. Tags are useful, but not a complete solution. I far prefer a well done tree or clearly defined categories to navigate a great deal of complex data. For instance, if you search CDW for computer gear, they have a nice filtering bar on the right that lets you filter the products by categories. As a tacher and curriculum writer for Baltimore City Public Schools, I'd like to see a neatly defined categories. Tags can be useful for ad-hoc categorizations that don't fit into groups, but they shouldn't be the canonical way of organizing information. For instance, if I'm searching for art work, i want to search for it by time period and then artist and possibly country. What else is really relevant to searching for artwork? Let artwork be tagged and have any number of other metadata, but let there be one, canonical organization scheme that fits 80% of the use cases.

    I fear that this product will create a repository many pieces of meta-data but little useful organization. I'd love to see the LR make an opinion about the most important categories for each type of work. While there should certainly be flexibility to allow resources to be categorized differently, a clear set of 'defaults' is important.

    Looking through the tech history, some of the most successful products (iPad, iPhone, the Ruby on Rails framework, 37 signals) had very, very strong opinions about how information should be organized.

    I hope the LR is a model for well thought out, practical 'default' meta-data tags and organization structures.

    To conclude this long comment, let me give a specific use case and example of ideal organization:

    Context: I'm a teacher teaching the Scarlett Letter for sophomores in HS.

    What I want: To find all the art, and historical documents of that time period.

    The interface I'd like to have: I type in the dates 1625-1675. I get results back grouped by discipline:

    For Art, I see all the works the National Gallery has for that period

    For History, I see all the documents the National Archives have for the period.

    For Science, I see all the articles and publications the Smithsonian has for the period.

    I realize I want to filter the results by geography: I type in New England (or Boston) and then see such things as the art the National Gallery has that happened in New England, the legal charters, writs, and contracts the National Archives have, and the scientific papers the Smithsonian has that are coming out of Harvard during that time.

    What I don't want to see is a google style list of results that have no meaningful grouping or organization except "relevance" which, really, what does that mean in this context?

    Anyways, I hope these comments are helpful in pushing the LR to ground their discussion in actual examples and use cases instead of generic and abstract understandings of the problem (I'm sure there are people in the LR, like possibly yourself, that have very nuanced understandings of the problem...as a newcomer, I'd like to be able to have that as well).

    3 years ago
  5. Moderator


    You should really post that question on our discussion list - which is also public and free to join.


    We're trying to stay out of the opinion creating work. We're trying to give others information to create those opinions. So if we provide a mechanism for 15 organizations to generate usage data around a set of learning resources, it's up to those orgs or other to aggregate the information and come up with better usage suggestions for their users.

    Many groups have tried to implement the other way: building too much functionality into the distribution network. I think the network needs to be de-coupled from the functionality you describe so many approaches can be tried without reimplementing the "base layer."

    For your use case, I'd say that third party sites such as OERCommons, Brokers of Expertise, Curriki, NYLearns, NSDL, Connexions and Hippocampus are where we think users will be going to undertake this activity.

    Learning Registry is being designed to connect these sites with each other and create better information about what resources are available which ones are relevant for various audiences.

    Like the dev products you mentioned, this project is "opinionated" as well but the opinion so far is that transport should be flexible, fast and unobtrusive. What materials are transported will vary over time and the network needs to accommodate innovations smoothly. Overdesign will prevent that which is why we're avoiding the indexing and search/recommendation "layer." Lots of organizations are doing this already - we want to give them more information to do this better.

    Learning Registry is not an end-user tool in that sense - maybe that's the short answer to your question?

    3 years ago