Approaching Principles for Independent Archives

I’m building an open, independent repository of public interest documents around a specific topic. A place where records of interest can be collected, organised and made accessible. An independent archive.

Detention Logs currently presents Incident Report data, but we’d like to present more kinds of data and make what’s there more accessible.

Below I’ve collected some basic principles for an independent archiving project. These are some of my thoughts on approaching the project and an open invitation for thoughts from others which I’ll collect and weave in. The more projects support each other to develop their ideas and practices, the more effective we can all be—

No matter who you are, most of the smartest people work for someone else.

If you’ve also been thinking about these issues, I’d really encourage you to respond or contribute in some way. Often the smallest, simplest bits are the most useful.

Why archive?

Recently I baited some smart people to think about the value in archives on Twitter. Cassie Findlay came back with this clear response:

Archives are evidence of business / activity. Their use(s) contingent on time, place and relationship to power.

Records (the stuff in archives) are evidence. We need evidence to make informed decisions about all kinds of things: personal decisions, political decisions, design decisions, business decisions, policy decisions, etc. . Evidence is crucial to our ideas of legal and social justice and to memory, history, prediction and planning.

An archive is the evidence material and the system of its processing, organisation, presentation and access. It’s the records and the context around them.

Collect, Organise, Provide

There are many interesting unreported documents floating around the web: documents that are evidence of government, business and cultural cultural activity. They’ve been ‘officially published’, but barely meet the loosest interpretation of accessibility. An unhelpfully-named, uncategorised file is quietly placed on an obscure or obscuring webpage. No description. Hiding in public.

For example all the Incident Report data published by Detention Logs had long been available on Immigration Department FOI discloser logs or on googleable web servers. They had gone unreported because it was expensive (time & resources) to understand what they were, process and analyses them. It took us hundreds of hours to get each incident report out of the original pdf.

While there are Open Government and accessibility guidelines, most of the documents we’ve collected do not meet them. This is especially true for documents received through the Freedom of Information system.

People in a position of power are incentivised to be closed with their information. We need legislation and a culture of transparency to make sure people with useful information share it. The act of collecting, categorising, transcoding and republishing evidence to make it more accessible to more people is an important act of accountability journalism. It lowers the cost for others to use and act upon it.

There is also the issue of secret, unofficial, suppressed and censored records: what does the public have a right to know? The question of “what to collect” is obviously crucial, but here I’m just thinking about what to do with records once they are collected.

Why independent?

Why set up an archive independently? What is the role of projects like InfoAus, OpenAustralia, Homicide Watch, Dronestagram, Planning Alerts, and Detention Logs? There are big government archives, libraries, journalism institutions, universities and corporations with programs to preserve and provide access to records. Why not leave it to the experts?

As more services become digital, and more people carry devices creating more data with them more of the time, the amount of interesting evidence worth preserving is exploding. The tasks of judging what to keep, what to let go and what should never be collected becomes ever more complex.

The cost of collecting, organising, storing and providing access to this material requires a constant budget, in part because of the unstable nature of digital storage. Funding for the types of institutions that run these projects is not increasing to match this demand.

Already a huge amount of archiving practice has been taken up by corporations like Google, Ancestry and Facebook. We don‘t know the future of these projects, we just know that a lot of these companies have deleted a huge amount information. There’s a strong insurance argument for creating diverse, networked archives.

While there are journalism institutions that publish the primary source material they report on, many don’t. There are certainly legitimate reasons not to publish this material with an article, but you often get the impression that the publisher is in a defensive, competitive head space, not thinking about the longterm value of their work, or the ways in which their audience might want to explore a story. Most institutions deny access by default. Creating open collections of source material is an act of journalism most easily committed by people who don’t live and die by ‘exclusives’.

It’s also just not that easy to organise and provide access to source material, even when institutions want to. This stuff is difficult.

There are thousands of years of archiving history, and long established professional communities of archivists, librarians and record keepers. For people working independently, it’s important to engage that practice, and draw on its deep knowledge. It’s also important to promote the worth of projects on the outside. Many within archivist communities believe their current practices are buckling under the weight of abundant digital data. The unrestrained experiments and improvisations of independent projects can feedback into their toolboxes.

The perspectives of users, archivists, historians, journalists, researchers, designers, engineers, librarians, accountants, administrators, lawyers, teachers, students, artists etc. can be drawn out to answer questions about publishing public records:

  • why should we collect?
  • what should we collect?
  • who should have access, and why?
  • What should they have access to do with records?
  • How can access be provided?

Small parts loosely joined

At a recent meeting of a few information activist type people, Joel Pringle pushed the idea that not every project has to optimise for access by all types of users.

While each of these examples have a clear purpose they also blur across categories. The more projects succeeding and failing with different approaches the better.

Small projects are best positioned to engage focused communities. They should also be open, thoughtfully networked and encourage the potential for widespread access.

Striving for universal accessibility and trying to engage a mass audience are very, very different things. It’s easy for unfunded volunteer projects to stretch themselves too thin. A multitude of networked approaches, overlapping in time and well as topic, will reach the furthest.

—the Web gets its value not from the smoothness of its overall operation but from its abundance of small nuggets that point to more small nuggets. And, most important, the Web is binding not just pages but us human beings in new ways. We are the true “small pieces” of the Web, and we are loosely joining ourselves in ways that we’re still inventing.

Why Principles?

Principles are a tool that you put your ideas into so that you don’t have to reconsider all the angles from scratch every time you want to make a decision. They say how a project will pursue its mission, articulating a shared vision for work.

I’m advocating for lots of small projects with divergent approaches, so why suggested common principles? What I’m really trying to do is get people to share their ideas and experiences in a useful form that can be quickly adapted to other projects. Each projects should have their own distinct principles, and they should be made available on a /principles page.

Because principles are modular by nature, you can consider your own project’s position and then draw on the knowledge of people who’ve been there before. Reuse and plug together a your own personal set of guidelines. Jeremy Keith’s is a great resource.

How & Why to Contribute

The easiest way to contribute is to leave a comment responding to the principles below. You can use the comment system, email or tweet at me, publish a responding blog post, whatever you like. Preferably you should respond somewhere public so everyone can see the discussion. You’ll see that I’ve also made some recommendations for the type of contributions that might be productive.

I’ve published the principles on GitHub where you can edit and improve them (inspired by ProPublic’s Nerd Guides. I know that not everyone is a GitHub user, here’s quick video on signing up and a video on why). Please feel welcome to post changes to the principles in the comments section as well. If there’s ideas in the comments that should be turned into edits on the GitHub repository I can handle that.

Why contribute? There’s a lot of work to do and we need lots of people involved to do it. Most people (like me) don’t have formal training in information management. The more we can share our experiences and knowledge and support the community of people doing this work, the more impact our projects will have. The technological landscape that we’re working in is racing away beneath us—I don’t know anyone who feels like they know what’s coming next.

Belonging to a community means participating, observing, and generally being in attendance (either physically or virtually). But being an advocate requires stepping forward and helping to articulate that community’s needs, or advance their interests, or—when necessary—protect their rights. You need to both amplify and clarify the values of a community, not merely share them.

# Principles for Independent Archives

Here are some broad, basic goals for an independent archive project and design principles which could help the project achieve them.

This is meant to be the start of a conversation. Any kind of rough contribution or idea is very, very welcome.

Contributions could include:

  • leaving a comment of any length with a response or thought about what an independent archive should do;
  • making suggestions for adding or cutting principles;
  • suggesting references for more information on individual principles;
  • improving the explanations of principles; and/or
  • pointing out how wording could be improved/typos etc. .

Principles for Independent Archives is licensed under a CC0 1.0 Universal, Public Domain Dedication. To the extent possible under law, Luke Bacon has waived all copyright and related or neighboring rights to Principles for Independent Archives. If you make a contribution in the comments system or make edits through GitHub and they are added or used in the original text your contribution will also fall under this license to make it as reusable to others as possible.

Goals for an independent archive

  • Reduce the cost of using and acting on the evidence in the archive.
  • Engage new people in the records.
  • Preserve access to the evidence for as long as possible in as many ways as possible.

Principles

Use open standards

Presenting records and data in a consistent, standardised, common format reduces the cost for others to use and act upon them. Open and long-lived formats, standards and structures have a better chance of lasting over time than closed, proprietary standards.

Think long term

From day one, plan for shut down. All projects end. Small, under-resourced, personal projects are not excluded. Discuss and have a plan for this situation. If possible, the records should remain accessible long after work on the project has stopped.

Stay small, let others create meta-collections

Provide clarity over quantity. Build something small. Even just a few dozen documents focused on a specific topic are valuable in the network. If we’re after a multitude of approaches then there’s value in the tiniest project.

Strive for universal accessibility, be accessible by default

Provide multiple ways to access the same content. Usability and accessibility must be at the core of an archive. Machine readability is not only central to accessibility, but will make your project useful to other projects, devices, search engines, scrapers, and so on.

Make the legal accessibility of your project clear: what rights are reserved and how may others use the work? Wide accessibility can support redundancy by allowing and supporting other people to create backup copies.

Store the original record, present its essence over its resolution

It’s important to store a copy of the original records as you found them. You can’t predict what extra interest that master copy may acquire in the future.

Sometimes, presenting this original record may be difficult and even problematic. It’s more important that people can access something rather than nothing. Many people might need to access a version with a smaller filesize, or only the meta-data.

Work together

Invite feedback and new perspectives. The network enables these projects and makes them valuable by massively increasing how information can be combined—support the network in turn. Provide helpful links between projects as much as possible. Share and discuss your methods.

Believe in your ideas, an independent perspective is worth a lot

Properly documented, your project’s failures may be more valuable to the community than its successes. If you think something will further your project’s goals, try it.

As long as it supports your goals, stay independent. If another entity is offering to take over work or contribute to the project, consider whether their priorities are fully compatible with the goals of your project.

In confronting the enormous challenges facing archivists, journalists and others looking to promote diverse public access to knowledge we need a multitude of approaches to find workable solutations.