ZENO Efficient Method for Characterizing Object Shape and for Calculating Transport Properties of Nanoparticles and Synthetic and Biological Macromolecule

Movie (will be available soon)
Summary of Important Properties
(Stevens Institute of Technology)
Protein Properties/Zeno: PDB ID code : (will be available soon) contact: ekang1@stevens.edu
Acknowledgements: We thank CTCMS at NIST for funding support of this development

OUTLINE II. THE PROGRAM MODELS ARBITRARY SHAPES AS UNIONS OF SIMPLE BODY ELEMENTS III. COMPILING AND INVOKING THE PROGRAM IV. GRAMMATICAL RULES FOR THE BODY FILE V. DESCRIPTION OF THE INTEGRATIONS VI. DESCRIPTION OF THE OUTPUT DATA Appendix B. Internal dictionary (words recognized by the grammar of the body file). Appendix C. Error estimates and propagation.
I. INTRODUCTION <Go to Top of Page> This document describes procedures for using the program zeno, which computes various shape functionals (e.g., certain electrostatic and hydrodynamic properties) of macromolecules of arbitrary shape. The fortran code is stored in the file zeno.f. You must create a text file beforehand with the complete body specification, which will be referred to hereafter as the "body file," and the program returns its results in a second text file, referred to as the "zeno file," or the "report file." Each shape is identified to the program by a character string of at most 25 characters, which will be represented in this document by the symbol <identifier>. The body file has the name <identifier>.bod, and the zeno file has the name <identifier>.zno.
The program performs as many as three different numerical integrations on the body. (You only request the integrations that are desired. You need not request all three.) The three integrations are: 1. The "Zeno" computation, a numerical path integration technique that solves Laplaces equation for two separate boundary value problems; an isolated, charged conductor, and a conductor in an external electric field. N_{Z} brownian paths are initiated from a sphere of radius R_{L}, the launch sphere, which completely encloses the body. You specify the value of N_{Z}, but R_{L} is determined internally from the specification of the body. This computation determines the electrostatic capacity or capacitance, C, and the nine components of the electrostatic polarizability tensor, a_{mn}.. Because of analogies between the hydrodynamic and the electrostatic boundary value problems, these quantities then permit the program to estimate the hydrodynamic radius, R_{h}, and the intrinsic viscosity, [h]. 2. The "interior" computation, a Monte Carlo integration over the interior of the body. You specify a large number N_{i}. The program begins generating points at random inside the launch sphere, and continues until 2N_{i} points are found that also lie inside the body. The volume, V, of the body is obtained as the volume of the launch sphere times the ratio of successful points to trial points. The meansquare radius of gyration of the interior of the body, R^{2}_{gi}, is obtained as one half of the meansquare distance between successive pairs of interior points. 3. The "surface" computation, a Monte Carlo integration over the surface of the body. You specify a large number N_{s}. The program generates 2N_{s} points distributed randomly over the surface, and uses these to compute the surface area, A, the Kirkwood radius, R_{K}, defined as the harmonic mean distance between arbitrary pairs of surface points, and the meansquare radius of gyration of the surface of the body, R^{2}_{gs}, defined as one half the mean square distance between arbitrary pairs of surface points. The raw values obtained from these integrations are then used to compute a number of derived quantities. II. THE PROGRAM MODELS AN ARBITRARY SHAPE AS A UNION OF SIMPLE BODY ELEMENTS <Go to Top of Page> The body is modeled as the union of simple, component body elements. Currently, eight different types of elements are recognized. In this document, we use the terms "open cylinder" and "closed cylinder," to refer to cylinders with and without ends. (Think of a tin can: A closed cylinder is the can with the ends intact, an open cylinder is the can with both ends cut off.) Table 1 defines the body elements currently in use.
TABLE 1. SUMMARY OF BODY ELEMENT TYPES
The body elements are of two types. The first type (type A) includes the elements that have threedimensional interiors; or that consist of surfaces enclosing a region of threedimensional space: spheres, tori, lenses, ellipsoids and cubes. The second type (type B) includes the elements that are twodimensional surfaces: triangles, disks, and cylinders. In all cases, the overall shape is taken to be the union of some collection of these body elements. This permits considerable power in defining shapes. Two important examples are first, defining a molecule as the region of space occupied by a set of overlapping spheres, and second, defining an arbitrary surface with a grid of triangles. (E.g., a geodesic dome.) The complete set of body elements used in any given specification, along with their positions and orientations, are specified in the body file. See below for the grammatical rules that must be followed in setting up the body file. This program can consider three major shape classifications. The first are those shapes that have threedimensional interiors, such as spheres, or a set of overlapping spheres. The second are those shapes that are twodimensional surfaces but that are embedded in threedimensional space, such as an open cylinder. The third are those shapes that are only twodimensional, that exist in a plane, such as a square. (There are also hybrid shapes, for example, the union of a sphere and a square.) Certain shape functionals, such as the electrostatic capacity, can be defined for all three classifications, and the zeno integration works successfully on all three. However, the interior integration can only be performed for shapes of the first classification, for only in this case is the interior defined. This creates a problem, because it is not always easy to program the computer to determine which of the three classifications we might have. For example, we can represent any surface as a grid of triangles, but it probably requires evaluation of certain topological invariants to determine whether or not the surface is in the first or the second class and this evaluation is beyond the scope of this program. Therefore, the following rules apply: (1) If the body contains any shape elements of type B, an interior integration will not be performed, even if one is requested. (2) It follows that the volume of the body can be determined only if it contains body elements of type A. It is defined as the total volume occupied by points inside the launch sphere that are also inside at least one of the body elements. (3) The surface area of the body will be defined as the area contributed by all points on the surface of each body element (of either type) that are not in the interior of any other body element of type A. (Users should realize that Rule 3 introduces vagueness into the definition of the surface area, but only for some kinds of bodies. Imagine constructing a body as the union of a square and a sphere, but representing both as grids of triangles. Then by Rule 3, points on the square but inside the sphere will contribute to the surface area as calculated by this program.) III. COMPILING AND INVOKING THE PROGRAM <Go to Top of Page> The program was developed using the Linux f77 compiler. Therefore, it should compile effortlessly with either of the two Linux compilers f77 or f90. The following Linux commands can be used: f77 zeno.f o zeno or f90 zeno.f o zeno The above f77 or f90 command prepares an executable file of name zeno. You then issue the following command to invoke the program: ./zeno <identifier> <actioncode1> <actioncode2> <actioncode3> The four strings <identifier>, <actioncode1>, etc., are accessed by the program via a call to the intrinsic fortran subroutine getarg. The string <identifier> is the same identifier name discussed above. The full body specification must be prepared beforehand and saved in the "body" file, which should be named <identifier>.bod. Section IV describes in detail the grammatical rules required for the body file. Output goes to a file with the name <identifier>.zno. Each action code is a string with three parts: a onecharacter prefix, a set of digits in the middle, and optionally a suffix. The prefix is one of the following three characters: z do a zeno integration on the body i do an interior integration on the body s do a surface integration on the body Allowed suffixes are any of the three characters: t = thousand m = million b = billion but these suffixes are optional. A few examples demonstrate the proper format of the action codes: z100t requests the zeno integration with N_{z} = 100 thousand. i1b requests the interior integration with N_{i} = 1 billion. s5000000 requests the surface integration with N_{s} = 5000000. You can specify as few as zero (which will result in no action being taken by the program) or as many as three action codes. For example, the following invocation of the zeno program: ./zeno spheroid z1m i1m s1m directs the program to get data on the body from the file spheroid.bod, to perform the zeno, interior, and surface integrations on the body, each with one million steps, and to put the results in a file named spheroid.zno. As another example, this invocation: ./zeno w85 i100t directs the program to get data on the body from the file w85.bod, to perform only the interior integration with 100 thousand steps, and to put the results in a file named w85.zno.
IV. GRAMMATICAL RULES FOR THE BODY FILE <Go to Top of Page> You use the "body file" to give the full specification of the body to the program. The name of the body file is <identifier>.bod, where <identifier> represents the indentifier string provided during the invocation of the program. As mentioned above, the body is set up as a union of simple body elements. Allowable elements are listed in Table 1. The data in the body file consist of a series of commands, which in turn consist of a series of character strings. The strings are delimited by spaces or by carriage returns. A single line is 80 characters or less, so do not put more than 80 characters between carriage returns. The first string of a command is its "predicate," and identifies the type of command. The remaining strings in each command are "modifiers" of the predicate. The modifiers to each predicate come in a specific order following that predicate, and each predicate requires a specific number of modifiers. There are no punctuation marks flagging the end of one command or the beginning of another. The command is defined as a valid predicate followed by the correct number of modifiers, which are then followed by the predicate of the next command. A single command, i.e., a predicate with its modifiers, can be spread across more than one line, and one command may end and another begin on the same line. However, for ease in reading by humans, you will probably want to design the body file with carriage returns between commands. To process the file, the program looks at the first string on the file. This string must be a valid predicate. If it is not, then the program aborts. Then, the program takes the next n strings, where n is the number of modifiers required for this particular predicate. The program also aborts if it has trouble interpreting any of the modifiers. Assuming these n strings are interpreted successfully, then the program repeats, reading the next predicate and its modifiers, etc., until it encounters the end of the file. The strings are of two types, "numeric" strings, or simply "numbers," and "alphabetic" strings, or simply "words." A valid "numeric string" or "number" is any character string that can be interpreted by the fortran internalread, freeformat command: read(string,*) value (This converts the numeric string into a floating point number.) A valid "alphabetic string" or "word," is one of the fifty or so words found in the programs "internal dictionary." All these words are given in this section, and also summarized in Appendix A. The program knows whether to expect a word or a number based on the position of the string relative to the beginning of the command. Any line with an asterisk in column 1 is interpreted as a comment, and is skipped over by the program. Blank lines can also be inserted for readability; these are also skipped over. Table 2 summarizes each commandtype, giving the valid predicate or predicates, the valid modifiers, and the action of the command. The commands either add a bodyelement to the growing body (ADD commands) or set the value of some variable (SPECIFY commands). Most of the commands have several synonymous predicates, e.g., the four strings "SPHERE," "sphere," "S," and "s" are all valid predicates for the ADD SPHERE command. The order of the modifiers as given below must be followed in the body file. The order of the commands is not important. Body elements can be added in any order, and the SPECIFY and ADD commands can be interspersed. See Appendix A for examples of valid body files. Table 2. VALID BODY FILE COMMANDS
Table 2, contd. VALID BODY FILE COMMANDS
Table 2, contd. VALID BODY FILE COMMANDS
Table 2, contd. VALID BODY FILE COMMANDS
V. DESCRIPTION OF THE INTEGRATIONS <Go to Top of Page> A. The Zeno Integration. The zeno integration is a numerical path integration. It simultaneously solves two separate boundary value problems in electrostatics, the charge distribution on, first, a charged conductor, and second, on a grounded conductor in a uniform external field. A flowchart summarizing the computation is given in Ref. [7]. Three quantities are required as input: The integration size, N_{z}, which you set through the actioncode during program invocation, the launch radius, R_{L}, which is determined automatically by the program from the body specification, and the skin thickness, ε, which is either set in the body file, or, by default, is set equal to R_{L} × 10^{6}. If you set the skin thickness yourself, we recommend values five to six orders of magnitude smaller than the lateral dimensions of the body. Outputs are the electrostatic capacity, C, and the nine components of the electrostatic polarizability tensor, a_{mn}. B. The Interior Integration. This is a Monte Carlo integration over the total volume of the object. Two quantities are required as input: The integration size, N_{i}, which you set through the actioncode during program invocation, and the launch radius, R_{L}, which is determined automatically by the program from the body specification. The calculation is performed by generating points at random inside the launch sphere, and discarding all those that do not also lie inside the body. This continues until 2 N_{i} points have been found inside the body. Outputs are: first, the volume, V, of the object, which is set equal to (4/3) pR_{L}^{3}f_{i}, where f_{i} is the fraction of points found inside the body; and second, the square radius of gyration accumulated by points over the interior, which has the definition: where denotes integration over the volume of the body. This integral is evaluated internally by averaging the squaredistance between pairs of points for a total of N_{i} pairs. The integration will be skipped if any of the body elements are of type B, as explained above. C. The Surface Integration This is a Monte Carlo integration over the total surface area of the object. One quantity is required as input, the integration size, N_{s}, which you set through the actioncode during program invocation. The calculation is performed by generating points at random over the surface of each body element (with each body element weighted according to its own surface area), and discarding all those that lie inside some other body element. This continues until 2 N_{s} points have been located on the surface of the body. Outputs are: first, the surface area, A, of the object, which is set equal to A_{0 }f_{s}, where f_{s} is the fraction of points retained and where A_{0} is the total combined surface area of all the body elements; second, the square radius of gyration accumulated by points over the surface, which has the definition: where denotes integration over the surface of the body of the body, and third, the Kirkwood radius, or harmonic mean distance between arbitrary surface points: These integrals are evaluated internally by averaging over pairs of points for a total of N_{s} pairs. D. Computation times. Of the three integration procedures, the zeno is generally slower than the other two for comparable values of N_{Z}, N_{I}, or N_{S}, although the precise timing depends on the shape. The times for each of the three integrations are generally linear in N_{Z}, N_{I}, and N_{S}, respectively. The time for a zeno integration is also linear in the number of body elements. When the body elements are spheres, and for Pentium III processors, an estimate of the time for a zeno integration is {2.3 × 10^{8})N_{z} minutes per body element.[7] (This is notable because finite element computations are cubic in the number of body elements.) The ellipsoid body elements are somewhat slower than the others. VI. DESCRIPTION OF THE OUTPUT DATA <Go to Top of Page> Results of the computation are reported in the "zeno file," a text file created by the program. The name of the zeno file is <identifier>.zno, where <identifier> represents the namestring introduced above. Whether or not a result is reported in the zeno file depends, obviously, on whether or not the requisite computation was performed and whether or not other requisite variables were set. To codify the rules followed by the program in reporting a quantity, let us first define several Boolean variables:
Table 3. Boolean variables controlling data output.
The following table summarizes all quantities that are reported in the zeno file. It includes the Boolean truthfunction that determines whether or not the program displays the quantity, and in some cases, a brief definition, the formula by which the quantity is computed, and appropriate literature references.
TABLE 4. SUMMARY OF OUTPUT DATA
TABLE 4, contd. SUMMARY OF OUTPUT DATA
TABLE 4, contd. SUMMARY OF OUTPUT DATA
TABLE 4, contd. SUMMARY OF OUTPUT DATA
Appendix A. Examples <Go to Top of Page> Example 1. A cube. Tables A.1 and A.2 display the body and zeno files, respectively, for the computation performed on a cube. Table A.1 box.bod
Table A.2 box.zno
Example 2. Five overlapping spheres. Tables A.3 and A.4 display the body and zeno files for a body constructed from five overlapping spheres. It displays some of the freedom you have in formatting the body file, including use of comments, blanks, and extending commands across more than one line. Table A.3 some.spheres.bod
Table A.4 some.spheres.zno
Example 3. The myoglobin molecule. Tables A.5 and A.6 display the application of these techniques to a protein molecule, myoglobin. The structure of the molecule was taken from the Protein Data Bank, entry code 1a6m. This protein consists of 151 amino acids, and was modeled with overlapping spheres, one sphere centered at each alphacarbon. Each sphere has radius 5 Å. This example demonstrates use of the SPECIFYMASS, SPECIFYTEMPERATURE, and SPECIFYSOLVENT commands in the body file. Since these variables were specified, the program was able to compute the massnormalized intrinsic viscosity and the diffusivity, neither of which appear in the previous zeno files. (For the sake of brevity, over 140 lines have been omitted from the body file.) Table A.5 1a6m.5.bod
Table A.6 1a6m.5.zno
Appendix B. SUMMARY OF WORDS RECOGNIZED BY THE GRAMMAR OF THE BODY FILE. <Go to Top of Page> Table B1 gives the "dictionary" for the grammar file. All the synonyms of any one word are listed together. In column 2, "P" indicates predicate, "M" indicates modifier. Table B1.
Appendix C. UNCERTAINTY estimates and propagation. <Go to Top of Page> Like any Monte Carlo integration, the results display sampling error. The program estimates the sampling error in the integrations, and propagates the errors through subsequent computations. Error estimation, propagation, and reporting by the program are explained here. Significant figures in the input data. The input quantities set in the body file, namely temperature, mass, and solvent viscosity, are assumed to contain experimental error. All digits displayed in a SPECIFY command, including trailing zeros, will be considered significant by the program. In other words, the two commands "MASS 20000 Da" and "MASS 2.00E4 Da" will be interpreted as m = (20000 + 0.5) Da , and m = (20000 + 50) Da, respectively. These uncertainties will then be propagated through subsequent computations as explained below. Sampling errors in the integrations. The following quantities are results of a Monte Carlo integration, and therefore display sampling error:
To estimate the integration error, each integral is performed 20 times independently, using an integration size of N/20. The final value is taken as the mean of these 20 independent integrations, while the sampling error is taken as the standard deviation divided by . Uncertainties resulting from the electrostatichydrodynamic analogy. Two formulas, first given above, R_{h = }q_{1}C and result from analogies between electrostatic and hydrodynamic boundary value problems. But the analogies are only approximate, so that q_{1} and q_{2} are not constant; rather they vary from shape to shape. The variation is small, however, and so using standard values of the two coefficients lets us use the results of an electrostatic calculation to approximate hydrodynamic properties. The current version of the program uses the values: q_{1 }= 1.00 + 0.01 and q_{2 }= 0.79 + 0.04 These uncertainties are propagated through subsequent calculations. This means that no matter how many Monte Carlo steps are used in the zeno integration, hydrodynamic properties directly related to these two coefficients will never appear with more than 2 or 3 significant figures.
Propagation of uncertainties. All other quantities are computed from the above values. Suppose that the computation of a variable y in terms of several variables x_{j} is represented in the following functional form: y = f (x_{1,} x_{2},K) Furthermore, let and represent uncertainties in y and x_{j}, respectively. Then, to estimate , the program uses
Final display of error estimates. As explained in the preceding paragraphs, uncertainty estimates are calculated for all quantities. The final results are always rounded, with the uncertainty in the final digit enclosed in parentheses. For example, the string 1.03(1)E06 represents the range of numbers (1.03 + 0.01) × 10^{6}.
Appendix D. Random numbers. <Go to Top of Page> The program uses the random number generator ran2 published in Press, Teukolsky, Vetterling, and Flannery, Numerical Recipes in Fortran 77, 2^{nd} edition, Cambridge University Press (1992). The program uses the Linux date command to generate the seed for the random numbers. Therefore, after execution, you will find the file <identifier>.dfl in you directory. It contains the Linux date stamp at the time of program initiation. It is perfectly safe to delete this file.
References: <Go to Top of Page> [1] Derivation of the analogy between capacitance and hydrodynamic radius: Hubbard and Douglas, "Hydrodynamic friction of arbitrarily shaped Brownian particles," Physical Review E, 47, R2983R2986 (1993). [2] Derivation of the path integral technique for the capacitance: Zhou, Szabo, Douglas, and Hubbard, "A Brownian dynamics algorithm for calculating the hydrodynamic friction and the electrostatic capacitance of an arbitrarily shaped object," J. Chem. Phys., 100, 38213826 (1994). [3] Test of the analogy between capacitance and hydrodynamic radius: Douglas, Zhou, and Hubbard, "Hydrodynamic friction and the capacitance of arbitrarily shaped objects," Physical Review E, 49, 53195331 (1994). [4] Derivation and testing of the analogy between polarizability and intrinsic viscosity: Douglas and Garboczi, "Intrinsic viscosity and the polarizability of particles having a wide range of shapes," Adv. Chem. Phys., 91, 85153 (1995), and Garboczi and Douglas, Physical Review E, 53, 616980 (1996). [5] The derivation of the path integral formulation for the polarizability and its application to a number of different shapes: Mansfield, Douglas, and Garbozci, "Intrinsic viscosity and the electrical polarizability of arbitrarily shaped objects," Physical Review E, 64, 061401 (2001). [6] Application zeno algorithm to flexible polymer models: Mansfield and Douglas, "Numerical pathintegration calculation of transport properties of star polymers and thetaDLA aggregates," Condensed Matter Physics, 5, 249 (2002). [7] Application of the zeno algorithm to proteins: Kang, Mansfield, and Douglas, "Numerical path integration technique for the calculation of transport properties of proteins," Physical Review E, 69, 031918 (2004). This reference also gives a flowchart for the zeno algorithm. [8] A good introduction to the use of path integral techniques in solving boundary value problems: Douglas and Friedman, "Coping with complex boundaries," IMA Series on Mathematics and its Applications, Vol. 67 (Springer, New York, 1995), p. 166185. [9] The quantities R_{Rus} and R_{Ray} are discussed in an appendix of Douglas and Freed, "Competition between hydrodynamic screening (draining) and excluded volume interactions in an isolated polymer chain," Macromolecules, 27, 60886099 (1994). This also contains references to the original literature. [10] An an appraisal of how accurately R_{K}, R_{Rus}, and R_{Ray} represent the hydrodynamic radius: Mansfield and Douglas, "Accuracy of several approximate formulas for the hydrodynamic radius and the diffusion coefficient," in preparation. 

Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page  
Go to Top of Page 
Last updated: 8/04/06
If you have any question about this web site, please contact to EunHee Kang