Notes for a Proposed Unified Tipitaka Reference System
(May 2005)

Note: This document has been superseded by "ATI Tipitaka Filename Convention".

Abstract

A growing number of translated fragments of the Pali Tipitaka are appearing on websites in more than a dozen languages. This trend is likely to accelerate, with the prospect of major sections of the Tipitaka becoming available online in many languages and alphabets by the year 2010. In order to facilitate inter-language cross-referencing of these translations, it will be useful to implement a standardized system of file-naming and directory hierarchies that remains invariant across alphabets and languages. As a first step towards this goal, this article proposes a simple scheme whereby textual units (suttas, verses, etc.) may be uniquely identified by a nine-digit reference number that accurately reflects the text's canonical location within the Tipitaka. This reference system is not intended to replace conventional Tipitaka naming and numbering schemes (e.g., PTS, IBRIC, etc.), but merely to serve as a basis for assigning useful language-invariant filenames.


The problem

The naming and organization of Tipitaka files on websites across the Internet is haphazard. In the absence of any standard file naming conventions, webmasters have resorted to inventing filing systems to suit their own immediate needs. To illustrate, the following suttas from the Majjhima Nikaya turned up recently on a web search:

MN 28 (English) /canon/sutta/majjhima/mn028-tb0.html
MN 28 (English) /e-tipitaka/mn-28.htm
MN 28 (English) /028-mahahatthipadopama-sutta-e1.htm
MN 23 (English) /dhamma-vinaya/mo/mn/mn023_mo.htm
MN 28 (Czech) /sloni-stopa.htm
MN 28 (German) /majjhima/m028n.htm
MN 28 (Italian) /tipitaka/mn28.html
MN 28 (Portuguese) /sutta/MN28.htm
MN 107 (Russian) /dhamma/canon/mn107.htm
MN 28 (Serbian) /budizam/canon/majjhima/mn28.html
MN 16 (Swedish) /buddha/95.htm

Several problems with these systems are immediately apparent:

  1. The file names are not consistent. Some name the file using a Pali name (028-mahahatthipadopama-sutta-e1.htm); some use the local language's translation of the Pali title (sloni-stopa.htm); some use a local index number (95.htm); some use uppercase nikaya abbreviations (e.g., MN); some use lowercase; some use hyphens and underscores. And so on.
  2. The directory hierarchies are rarely designed in a way that reflects the structure of the Tipitaka itself. For small sites, this is not necessarily a problem. As sites grow and accumulate more texts, however, file management problems are bound to escalate. For example, if we store the MN files in the directory /home/sutta/, do we also put SN files (perhaps all 2,889 of them) there? This is impractical for more than a few dozen suttas.
  3. Because each site has its own file-naming system and directory hierarchy, there is no way to know a priori where, for example, MN 28 is located on a given site that is known to offer sutta translations. This makes it extremely difficult to install hyperlinks between websites, as the webmaster must first decode the target site's often opaque filing system.

These inconsistencies pose no problem for small websites. Large websites, and sites that tend to serve as distribution source points for other sites, should, however, be concerned.

Finding a solution

Here are some general principles to help guide us to a good solution:

  1. Directories should be structured in a way that reflects the structure of the Tipitaka itself. The Tipitaka has already been nicely divided into nikayas, vaggas, etc., a system that readily translates into directories. For example, instead of putting every sutta from SN into the directory /sutta/samyutta, put them in subdirectories according to their samyutta: /sutta/samyutta/devata/, /sutta/samyutta/devaputta/, /sutta/samyutta/kosala/, etc. This divides the directories up into manageable chunks containing no more than a few dozen suttas.
  2. Filenames should contain information about the file's location in the Tipitaka. For example, instead of storing MN 28 in majjhima/28.html, store it as majjhima/mn028.html. This way, if the file is taken out of the context of its enclosing directory (e.g., as an e-mail attachment), one needn't open the file to figure out from where in the Tipitaka it came.
  3. Filenames containing numbers should be numerically sortable. Naming your files mn4.html, mn43.html, and mn150.html is not a good idea, as when they are listed alphabetically, they will appear in alphabetical order (mn150.html, mn4.html, mn43.html). Instead, use the appropriate placeholder: mn004.html, mn043.html, and mn150.html.
  4. Use numbers for filenames and directories. Details forthcoming.

Developing a numbering scheme

The details of how the various sections are divided and numbered is spelled out in detail in ATI's "Sutta Reference Numbers" page. For example, we divide Dhp into verses, not into chapters and verses. This simplifies the enumeration of the texts enormously.

Comments?

Division Range Value Section name
Pitaka 1-3 1 Vinaya
2 Sutta
3 Abhidhamma
Nikaya (Sutta Pitaka) or
Book (Vinaya & Abhidhamma)
1-7 1 Sv / DN / Dhs
2 Mv / MN / Vibh
3 Cv / SN / Dhtk
4 Par / AN / Pug
5 - / KN / Kv
6 - / - / Yam
7 - / - / Pt
Book (Sutta Pitaka) or
Chapter (Vinaya & Abhidhamma)
00-??
(What's the max. number
of chapters in Vin and Abhi?)
00 {applies to Nikayas 1-4 only}
01 ? / Khp / ?
02 ? / Dhp / ?
03 ? / Ud / ?
04 ? / It / ?
05 ? / Sn / ?
06 ? / Vv / ?
07 ? / Pv / ?
08 ? / Thag / ?
09 ? / Thig / ?
10 ? / J / ?
11 ? / Nidd / ?
12 ? / Ps / ?
13 ? / Ap / ?
14 ? / Bv / ?
15 ? / Cp / ?
16 ? / Netti / ?
17 ? / Pk / ?
18 ? / Miln / ?
?? ? / - / ?
Vagga {SN}
or Nipata {AN} (Sutta Pitaka)
Section? (Vin and Abhi)
00-??
(What's the max. number
of "sections" in Vin and Abhi?)
00 {DN, MN, and KN only}
01 I {SN and AN}
02 II {SN and AN}
03 III {SN and AN}
... ...
10 X {SN and AN}
11 XI {SN and AN}
12 XII {SN}
... ...
55 LV {SN}
56 LVI {SN}
?? ??
Sutta or verse (Sutta Pitaka)
What do you call these units in Vin and Abhi?
001-999 001 1
002 2
... ...
999 999

Examples

  • 1.1.11.01.001 = Background story to bhikkhu Sanghadisesa rule #11 (Horner Vol. 1 p.304)
  • 1.1.45
  • 1.2.08.26.001 = Story of the monk with dysentery (Mv VIII.26.1; Horner Vol. IV, p. 431)
  • 2.1.00.00.022 = DN 22 (Foundations of Mindfulness)
  • 2.2.00.00.007 = MN 7 (Simile of the Cloth)
  • 2.3.00.56.011 = SN LVI.11 (Setting the Wheel of Dhamma in Motion)
  • 2.4.00.03.065 = AN III.65 (Kalama Sutta)
  • 2.4.00.10.208 = AN X.208 (Sublime Attitudes)
  • 2.5.02.00.345 = Dhp 345
  • 2.5.03.03.002 = Ud III.2
  • 2.5.09.06.010 = Thig VI.10

Advantages

  • Useful for comparing and cross-referencing filesystems with translations in different languages and alphabets, since numbered references are more universally recognized -- and more conveniently written -- than their corresponding transliterated Pali names.
  • Filenames sort immediately into canonical order.
  • Typographic style can be easily localized. With the suitable filter and lookup tables, a file named "2.4.00.03.065" can be presented either as "AN III.65" (ATI) or "Anguttara Nikaya Threes, No. 65" (old BPS books) or "A i 187" (PTS Pali), etc., according to your local publishing style.

Disadvantages

  • Is it really necessary?

Multiple translators, languages

What about the case where a text is translated by several translators in the same language? How do we distinguish them?

  1. Give each translator his/her own complete root-level file hierarchy. That is, translations by Translator A would reside in their own directory structure, while those by B would reside in another one. Disdvantage: When putting together an anthology of translations by different translators, the files would have to renamed in order to distinguish them. E.g. we'd have to rename Translator A Dhammapada verse "2.5.02.00.345.A" to distinguish it from B's "2.5.02.00.345.B"
  2. Include translator info in the filename. Disadvantages: (1) complexifies (lengthens) the file name; (2) To be compatible with other file sets, all files in an anthology by just one translator would have to include the translator code.

The preceding discussion applies equally to translations in different languages. Translations in language X would be distinguished from those in language Y thus: "2.5.02.00.345.X" and "2.5.02.00.345.Y".

Translator code:
A four-byte code -- two alphas followed by two digits. E.g., "tb00" for Thanissaro Bhikkhu, "ao00" for Andrew Olendzki, etc. The digit field allows for expansion: "kb00" (Khantipalo Bhikkhu), "kb01" (Kumara Bhikkhu), etc. Because there may eventually be more than 10 b's (or some other popular last initial), it's a good idea to have two digits.
Language code:
Two byte language code, following the ISO-639 convention. (E.g. "en" for English, "ru" for Russian, etc.). Note that two bytes does not leave enough room to accommodate extended languages such as "en-us" and "en-gb". Is this a problem?
Combined code:
When combining language and translator codes, translator comes first. Why? Two reasons: (1) It's easier to extend the language code someday (e.g., from "en" to "en-us") by appending to the filename rather than inserting into it; (2) if a text is further translated, we can just keep appending language codes. Thus: "kb01en" is Kumara Bhikkhu's translation of the Pali into English. "kb01enru" is someone's translation into Russian of Kumara Bhikkhu's Pali translation."kb01enrusw" is someone's translation of that into Swahili. (How to handle extended language codes when more languages are appended is beyond the scope of this proposal.)

Examples

  • 2.5.02.00.345.tb00 = Thanissaro's Dhp 345
  • 2.5.09.06.010.tb00 = Thanissaro's Thig VI.1
  • 2.5.09.06.010.ao00 = Andrew Olendzki's Thig VI.1