This document describes the set of conventions (sometimes arbitrary) about how to write bitpit code. It is much easier to understand a large codebase when all the code in it is in a consistent style.
bitpit source code is organized in the following directory structure:
If you're designing a new class or other code for bitpit and are not sure where to put it, try to find something similar and put it there. Otherwise, email the bitpit email list for pointers.
Indentation should use only spaces and the number of spaces at each indentation should be 4.
The preferred way to ease multiple indentation levels in a switch statement is to align the "switch" and its subordinate "case" labels in the same column instead of "double-indenting" the "case" labels. E.g.:
Don't put multiple statements on a single line unless you have something to hide:
Get a decent editor and don't leave whitespace at the end of lines.
The other issue that always comes up in C++ styling is the placement of braces. Unlike the indent size, there are few technical reasons to choose one placement strategy over the other, but the preferred way, as shown to us by the prophets Kernighan and Ritchie, is to put the opening brace last on the line, and put the closing brace first, thusly:
This applies to all non-function statement blocks (if, switch, for, while, do). E.g.:
However, there is one special case, namely functions: they have the opening brace at the beginning of the next line, thus:
Heretic people all over the world have claimed that this inconsistency is ... well ... inconsistent, but all right-thinking people know that (a) K&R are right and (b) K&R are right. Besides, functions are special anyway (you can't nest them in C++).
Note that the closing brace is empty on a line of its own, except in the cases where it is followed by a continuation of the same statement, i.e., a "while" in a do-statement or an "else" in an if-statement, like this:
and
Rationale: K&R.
Also, note that this brace-placement also minimizes the number of empty (or almost empty) lines, without any loss of readability. Thus, as the supply of new-lines on your screen is not a renewable resource (think 25-line terminal screens here), you have more empty lines to put comments on.
Do not unnecessarily use braces where a single statement will do.
and
This does not apply if only one branch of a conditional statement is a single statement; in the latter case use braces in both branches:
Class names should be in the CamelBack style, e.g. PatchCartesian or PatchOctree.
Class methods names should be in CamelBack style style but with a starting lower case letter, e.g. Patch::getId() or Interface::getArea().
Class private member variables should be in the CamelBack style and should have the prefix m_, e.g. PatchCartesian::m_cellSize. Each member variable that the user can modify, should have set/get functions, e.g. for the variable int m_variable it is necessary to define void setVariable(int newval) and int getVariable() const.
Static (global) variable names should be prefixed with s_.
Enumeration values should be all captitalized, with underscores avoided if possible. The name of the enumeration name should be in the CamelBack style and indicates the general purpose of the enumeration, so e.g. we use Interface::PositionType, not Interface::InterfacePositionType.
No names should be added to the global namespace. Everything should be in the bitpit namespace. An exception can be made for names with a static scope declared in a .cpp file, but class member functions never have a static scope.
Names should be kept as private as possible. If declaring a struct or utility class that is used internally by some other class, consider defining it in the .cpp file of the main class or a separate header only included in that .cpp file and using (if necessary) only forward delcarations (e.g. class
Node
;) in the header file used by other code. If that is not possible, then consider nesting the definitions such that the scope of the name is limited to that of the class using it.
Any names introduced into the top-level bitpit namespace should be sufficiently unique to avoid conflicts with other code. If you must introduce a class to the top-level bitpit namespace, don't choose an overly genereric name like Node
.
Developers should avoid using #include in header files, as they propagate dependencies more widely than necessary. The only cases where other includes are needed are to import the declaration for a parent class, and to declare types used as non-pointer and non-reference function arguments. In most cases, a forward-declaration statement (e.g. 'class Interface') will suffice.
Don't use a pre-processor macro where a const variable or an inline or template function will suffice. There is absolutely benefit to the former over the later with modern compilers. Further, using macros bypasses typechecking that the compiler would otherwise do for you and if used in headers, introduce names into the global rather than bitpit namespace.
Don't define constants that are already provided by standard libraries.
Please don't use things like "vps_t". It's a mistake to use typedef for structures and pointers. When you see a
in the source, what does it mean? In contrast, if it says
you can actually tell what "a" is.
Lots of people think that typedefs "help readability". Not so. They are useful only for:
totally opaque objects (where the typedef is actively used to hide what the object is).
Example: opaque objects that you can only access using the proper accessor functions.
NOTE Opaqueness and "accessor functions" are not good in themselves.
Clear integer types, where the abstraction helps avoid confusion whether it is "int" or "long".
NOTE Again - there needs to be a reason for this. If something is "unsigned long", then there's no reason to do
but if there is a clear reason for why it under certain circumstances might be an "unsigned int" and under other configurations might be "unsigned long", then by all means go ahead and use a typedef.
New types which are identical to standard types, in certain exceptional circumstances.
Although it would only take a short amount of time for the eyes and brain to become accustomed to the standard types like 'uint32_t', some people object to their use anyway.
When editing existing code which already uses one or the other set of types, you should conform to the existing choices in that code.
Maybe there are other cases too, but the rule should basically be to NEVER EVER use a typedef unless you can clearly match one of those rules.
In general, a pointer, or a struct that has elements that can reasonably be directly accessed should never be a typedef.
Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all know), and do one thing and do that well.
The maximum length of a function is inversely proportional to the complexity and indentation level of that function. So, if you have a conceptually simple function that is just one long (but simple) case-statement, where you have to do lots of small things for a lot of different cases, it's OK to have a longer function.
However, if you have a complex function, and you suspect that a less-than-gifted first-year high-school student might not even understand what the function is all about, you should adhere to the maximum limits all the more closely. Use helper functions with descriptive names (you can ask the compiler to in-line them if you think it's performance-critical, and it will probably do a better job of it than you would have done).
Another measure of the function is the number of local variables. They shouldn't exceed 5-10, or you're doing something wrong. Re-think the function, and split it into smaller pieces. A human brain can generally easily keep track of about 7 different things, anything more and it gets confused. You know you're brilliant, but maybe you'd like to understand what you did 2 weeks from now.
In source files, separate functions with one blank line.
In function prototypes, include parameter names with their data types. Although this is not required by the C++ language, it is preferred because it is a simple way to add valuable information for the reader.
Albeit deprecated by some people, the equivalent of the goto statement is used frequently by compilers in form of the unconditional jump instruction.
The goto statement comes in handy when a function exits from multiple locations and some common work such as cleanup has to be done. If there is no cleanup needed then just return directly.
Choose label names which say what the goto does or why the goto exists. An example of a good name could be "out_buffer:" if the goto frees "buffer". Avoid using GW-BASIC names like "err1:" and "err2:". Also don't name them after the goto location like "err_kmalloc_failed:"
The rationale for using gotos is:
Functions can return values of many different kinds, and one of the most common is a value indicating whether the function succeeded or failed. Such a value can be represented as an error-code integer (-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure, non-zero = success).
Mixing up these two sorts of representations is a fertile source of difficult-to-find bugs. To help prevent such bugs, always follow this convention:
If the name of a function is an action or an imperative command, the function should return an error-code integer. If the name is a predicate, the function should return a "succeeded" boolean.
For example, "add work" is a command, and the add_work() function returns 0 for success or -EBUSY for failure. In the same way, "PCI device present" is a predicate, and the pci_dev_present() function returns 1 if it succeeds in finding a matching device or 0 if it doesn't.
All public functions must respect this convention. Private (static) functions need not, but it is recommended that they do.
Functions whose return value is the actual result of a computation, rather than an indication of whether the computation succeeded, are not subject to this rule. Generally they indicate failure by returning some out-of-range result. Typical examples would be functions that return pointers; they use NULL to report failure.
There appears to be a common misperception that gcc has a magic "make me faster" speedup option called "inline". While the use of inlines can be appropriate (for example as a means of replacing macros), it very often is not. Abundant use of the inline keyword leads to a much bigger executable, which in turn slows the program as a whole down, due to a bigger icache footprint for the CPU and simply because there is less memory available for the pagecache. Just think about it; a pagecache miss causes a disk seek, which easily takes 5 milliseconds. There are a LOT of cpu cycles that can go into these 5 milliseconds.
A reasonable rule of thumb is to not put inline at functions that have more than 3 lines of code in them. An exception to this rule are the cases where a parameter is known to be a compiletime constant, and as a result of this constantness you know the compiler will be able to optimize most of your function away at compile time.
Often people argue that adding inline to functions that are static and used only once is always a win since there is no space tradeoff. While this is technically correct, gcc is capable of inlining these automatically without help, and the maintenance issue of removing the inline when a second user appears outweighs the potential value of the hint that tells gcc to do something it would have done anyway.
Wherever possible, don't use preprocessor conditionals (#if, #ifdef) in .cpp files; doing so makes code harder to read and logic harder to follow. Instead, use such conditionals in a header file defining functions for use in those .cpp files, providing no-op stub versions in the #else case, and then call those functions unconditionally from .cpp files. The compiler will avoid generating any code for the stub calls, producing identical results, but the logic will remain easy to follow.
Prefer to compile out entire functions, rather than portions of functions or portions of expressions. Rather than putting an ifdef in an expression, factor out part or all of the expression into a separate helper function and apply the conditional to that function.
If you have a function or variable which may potentially go unused in a particular configuration, and the compiler would warn about its definition going unused, mark the definition as __maybe_unused rather than wrapping it in a preprocessor conditional. (However, if a function or variable always goes unused, delete it.)
At the end of any non-trivial #if or #ifdef block (more than a few lines), place a comment after the #endif on the same line, noting the conditional expression used. For instance:
Comments are good, but there is also a danger of over-commenting. NEVER try to explain HOW your code works in a comment: it's much better to write the code so that the working is obvious, and it's a waste of time to explain badly written code.
Generally, you want your comments to tell WHAT your code does, not HOW. Also, try to avoid putting comments inside a function body: if the function is so complex that you need to separately comment parts of it, you should probably split the function into smaller functions. You can make small comments to note or warn about something particularly clever (or ugly), but try to avoid excess. Instead, put the comments at the head of the function, telling people what it does, and possibly WHY it does it.
Try to keep header files free of comments; when comments are inside headers, all users of those headers must be recompiled if a comment is changed.
Each class should be fully commented with a doxygen comment block. A doxygen comment block is a special comment block with some additional markings, so doxygen knows it is a piece of structured text that needs to end up in the generated documentation. The preferred way to mark a doxygen comment block is the JavaDoc style, which consist of a C-style comment block starting with two *'s, like this:
/*! * ... text ... */
The doxygen comment block of a class should include a general description of the class and a list of its features and possible limitations.
A doxygen comment block should be added for every method of a class (both public and private methos should be commented), this comment block should contain a general description of the method and a description of all the arguments and the return value of the method. For instance:
/*! * \brief Brief description. * Brief description continued. * * Detailed description starts here. * * \param var detailed description of the parameter * \result Detailed description of the return value. */
To document the members of a file, struct, union, class, or enum, it is sometimes desired to place the documentation block after the member instead of before. For this purpose an additional < marker is required in the comment block. Note that this also works for the parameters of a function. For instance:
int var; //! Detailed description after the member
To simplify the documentation of class members, define just one member per line (no commas for multiple members declarations).
Doxygen commands should start with a backslash (\).
As a rule of thumb, the code should run through doxygen without generating any warnings; in fact, doxygen is sometimes helpful at pointing out inconsistencies in your class declaration.
bitpit provides a .clang-format file that can be used by clang-format to automatically format the code. Code formatting can be enforced manually by calling clang-format directly or automatically through the the CMake targets defined by bitpit. Each bitpit module provides a CMake target called clang-format-<MODULE_NAME> that allows to format all the code of the module. There is also a Git pre-commit hook which uses the "clang-format git hooks" tool to ensure the changes are properly formatted.
As most of our code repositories uses git as the revision control system, it is important to decide on a workflow that can be followed by the individual developer. The way that any individual developer interact with the upstream git repository can have an important impact on other developers and the ability to identify and manage individual changes. This set of guidelines and practices attempts to establish some standards for how developers will interact with the upstream git repository.
As a general rule, developers should update frequently, and commit changes often. However, the repository should always remain in a state where the code can be compiled. Most of the time, the code should also successfully execute "make check" run from the top-level directory. If you commit code that violates this principal, it should be your first priority to return the repository code to a compilable state, and your second priority to make sure "make check" runs without errors. Although it would be possible and many software projects do it, we prefer not to force successful execution of the test suite before every commit. Developers should make every effort to avoid having to impose this constraint, by running a make check before every commit.
Commits to the repository should also come with a non-trivial and useful log message.
A critical concept is that all changes shall be developed outside of the master1 branch. Whether they are in a different branch of the upstream2 repository (gitflow) or a branch of an entirely different fork (forking workflow) is secondary. This is a well-established concept regardless of the workflow being adopted, and allows a number of other benefits as described below.
There are a number of benefits of working on a different fork rather than a branch of the upstream repo, although not strictly technical:
Although this can be imposed technically by limiting the authority to alter the upstream repo (as in the forking workflow), a healthy developer community can also simply rely on convention. The advantage of doing it by convention rather than by restriction is that it is easier to distribute the load of reviewing and accepting changes. A critical consequence of this decision is that all code is reviewed before it is committed to the upstream master branch. This has benefits to overall quality in two related ways:
This practice does, however, place a substantial burden on the developers to perform timely reviews of the pull requested (PR’ed) code. PR’s that languish without sufficient review have a number of negative consequences:
1 Although a repository may choose a different name for its main development branch, this document will refer to that as the “master” branch.
2 For this discussion, the “upstream” repo will refer to the centralized authoritative repository used to synchronize changes.
Given the above practices, there are some mechanical details that can help ensure that the upstream/master repository is always in a state that facilitates all repository actions and interactions.
Feature branches being used for development should be kept up-to-date with the upstream/master by rebase only. When a feature branch is rebased against the upstream/master, all changes in the upstream/master are inserted into the feature branch at a point in its history that is prior to any of the changes of the feature branch. This can require conflict resultion as the feature branch changes are “replayed” on top of the new upstream/master in its current state. The primary advantage of this policy is that it keeps all of the feature branch changes contiguous. If, by contrast, the upstream/master is merged into the feature branch, the recent changes in the upstream/master become woven into the set of changes in the feature branch. This can make it more difficult to isolate changes later on.
Strict adoption of this practice is important since a single merge into a feature branch that is then merged back into the upstream/master can make it nearly impossible for others to rebase.
A typical workflow with pull-request might look like this, all using the command-line, except for submitting the final pull request. Note that there is never a merge operation.
master
branch before anything else origin
) origin
) When you are ready to submit a pull request, be sure that your feature branch is up-to-date. This first step may seem redundant but is here to be clear which branch we are acting on
This may generate conflicts that can be addressed at this point.
NOTE: This step can be performed at any time and should be performed as often as practical to reduce the scope of potential conflicts.
origin
) When ready to be adopted into the upstream/master, feature branches should be combined by merge only. This adds the changeset to the end of the upstream/master as a set of individual commits but in a contiguous block.
A typical workflow to merge a pull-request might look like this, all using the command-line.
master
branch before anything else (just because it’s never a bad idea!) Top: Coding Style Guide