|
|
|
Well, my thoughts continue to be wildly rambling and incoherent every time I start to write stuff down. This stuff is not easy!!
I am just going to dump a bunch of stuff in here in the hopes of starting a logical discussion. First of all, I still do not know what "best practices" are, or how things "should" be done. We are basically at a crossroads with our build system. We are ready to commit to Pulse I think, and even design around it, but we want to do things right. So, first off - let's describe a few types of builds. Triggered (Manual or Auto) Builds from HEAD (nightly, latest.integration, whatever you want to call it): ---------------------------------------------------------------------------------------------------------------------------------------- - Build using the latest version of source (HEAD) - Build using the latest built versions of dependencies found on the build server (which implies we need access to build artifacts in other, dependent projects) - Post-build, tag every input with a unique buildID. - We may also want to apply a "LATEST_INTEGRATION" tag so that others can pull the latest integration version of something without having to know a specific build ID. This would be more useful for binaries that are checked into SCM. Is there another use case for this? "Building" a Release (or just "Releasing Software") --------------------------------------------------------------------- - Building a release. Hmm... The problem I have with building a release is do we always "build" a release, or do we "promote" an existing nightly or integration build to a release? Which method should one choose? Always "Building" a Release From a Tag ------------------------------------------------------- If every nightly build or integration build tags sources in SCM, then it is a simple matter of running another "release build" from a tag using Pulse's prompting capability. This will fire off another fresh build using the given tag and will theoretically produce the exact same results. However, you could run a separate target in your build scripts (maybe a target named...go figure..."release"?) that not only built artifacts, but also performed various release-related tasks like publishing artifacts somewhere, archiving binaries officially, etc. "Promoting" an Existing Build to a Release ---------------------------------------------------------- Promoting a build is essentially the same as building a release from a tag, except that the build has already occurred. This means you could theoretically skip the whole build process and just run any post-build, release tasks. This seems much more efficient, but somehow less "right" than always building fresh from a tag. Does anyone have any thoughts on this? Building a Patch To a Release ------------------------------------------ - Frankly, I don't know what to do here. If you want to build a patch to an existing release, meaning building with respect to an existing tag in SCM. This means some projects will be built from the old code and the old tag, but other stuff may be at a different branch level. I guess this would mean that it is up to the user to tag these branches appropriately, and somehow ensure that the dependencies are pointing to the correctly tagged versions of any dependencies. It is really this dependency specification that is the difficult part. Without checking in some sort of dependencies file with hard-coded tags for all dependencies, I'm not sure how to do this properly. OK, so what about dependencies? Depending on the build scenario, we need to be able to find different versions of a build dependency. - For integration and nightly builds, you want to build using the latest version of a dependency, which means the version you just built, or the most up to date version available. This implies access to the artifacts on the build server. - For building from a tag, you will want a previous version of an artifact. This version will either be tagged itself and checked into a repository, or will simply be the result of a tagged build of a dependent project. For complete reproducability (timestamps are the same) you basically need to check the artifact in somewhere, or archive it somehow. Then you need some way of figuring out where to get it from (I think you'll need something like Ivy for this, or check your binaries into SCM). For simple binary reproducability only (the timestamp can be different, but the artifact is binary-identical), it is only required to rebuild a dependency from tagged source and use it in your build, in which case, you revert to the case described above where you just need to find an artifact on the build server the same as you would in the case of a nightly build. Requirements: Access to specific versions of build artifacts from dependent projects implies that when a project is built, it should build its dependent projects first to ensure the build artifacts it requires are up to date. This means: a) A project needs to know about its dependencies. b) The build server needs to provide access to the built artifacts by exposing them in a known, repeatable manner (not some randomly generated path/url) c) To handle both the nightly build scenario as well as the build from a tag scenario, we need to be able to grab the version of an artifact based on the tag used to build the dependency ("HEAD" versus "SOME_TAG"). d) The build server needs to know if its dependencies have changed since the last build so it can rebuild the project if required. So what does this have to do with Pulse? - We need to be able to declare a dependency on another Pulse project. Ideally, it should be possible to version this declaration as part of the versioned pulse file so that changes in dependencies can be captured. This declaration will hook up some behaviour: - Will expose the dependent project's artifacts on the Pulse server to this project. - There should be a way to specify the version of a dependency. There should be a special "latest" version, meaning just use the latest version on the Pulse server. You should also be able to supply a fixed version of a dependency, meaning a tag in SCM of the project that creates that artifact that is a dependency. PROBLEM: Let's say you are running an on-demand or nightly build set to run on HEAD that also uses the "latest" versions of all dependency artifacts. No problem. Your build runs, builds everything, and tags all resources in SCM with a tag (let's say v0001). Now, it is a week later and you have performed 'x' more builds, but you want to go back and rebuild v0001. You initiate a build manually, and you have the build server prompt you for a tag. You enter 'v0001', and a rebuild starts. But how does the build server know to use "v0001" of all dependencies, and not the "latest" version? - Perhaps we need the option of inheriting the version/tag of the project declaring the dependency. So, whatever tag you use to build a project, use the same tag when fetching/building dependencies. - You will also want to be able to specify a hard, explicit version of a dependency. For example, we may want to manually build a toolchain (a compiler, for instance) and check it into SCM, tagging it with a specific version (this is something we do now). Then we can declare a hard dependency on that binary, using a specific version, and ship a binary, and time-stamp identical version of an executable. I guess the main issue here is, does this stuff even belong in Pulse, or should we just use something like Ivy, and write some code to hook it into the Pulse API to trigger builds of dependencies? Sorry if this is all gibberish. I figured rather than waste more time thinking about it, let's start discussing some of this stuff. If any of this is completely off-base, just let me know. Also, you guys are the build experts, right? I'd love for you to tell me ho wmy build system *should* work. ;o) Mark. At the risk of providing too much confusing information, here is a somewhat more logical workflow with some questions that arose in our requirements gathering. A little more details about our tools are in order. We have an Eclipse-based product that ships various toolchains, include files, sample code, etc. for embedded development. We have a somewhat circular dependency in that the final "Installer" including the entire Eclipse-based development environment also includes compiled sample code, that are actually projects intended to be used with the installed Eclipse product, and are built using our plugins. But this is really a bootstrappign problem. It doesn't have to be circular if our build was granular enough. So, assuming we can break out the build plugins of our final product into a small component called the "Firmware Builder" in the following example, life is good. Here we go:
============================================================================== Example Workflow ============================================================================== Typical workflow for a series of dependent projects containing binary artifacts that must be preserved (including timestamps). Given the following scenario: Component: "Project Y" - Depends on its own source code - Also requires the "Include Set B" component - A firmware project that requires "Library X" to build - Also requires the "Tool A" and "Tool B" components - Built with a headless IDE instance (component "Firmware Builder"), so also requires the "Firmware Builder" component Artifacts: "projectY.abs" (an embedded executable) Component: "Include Set B" - A set of include files, simply checked out of source control using a specific tag - No dependencies Artifacts: "include1.inc", "include2.inc", "include3.inc" Component: "Library X" - Depends on its own source code - A library that requires the "Tool A" and "Tool B" components - Also requires the "Include Set B" component - Built with a headless IDE instance (component "Firmware Builder"), so also requires the "Firmware Builder" component Artifacts: "libraryX.lib" Component: "Firmware Builder" - Depends on its own source code - A set of plugins contained in a feature that, when combined with at least an Eclipse platform runtime and any required toolchain binaries, can be used to build firmware in headless mode - Can be spawned with a specific call to Java, or using build.bat (which would then imply a dependency on a component containing build.bat) NOTE: In order for the firmware project to be built properly, using the Firmware Builder, it will require the appropriate toolchains, include files, and possibly other firmware binaries (libraries) to be present in appropriate locations (bin, include and lib folders). These really aren't dependencies of the Firmware Builder, they are more dependencies of the firmware component being built. Artifacts: "featuresAndPlugins.zip" (or multiple JAR files) Component: "Tool A" - Depends on its own source code - Requires tools be installed on the build machine, but has no other dependencies - Built with dmake Artifacts: "toolA.exe" Component: "Tool B" - Depends on its own source code - Requires tools be installed on the build machine, but has no other dependencies - Built with dmake Artifacts: "toolB.exe" ============================================================================== Standard Build Loop ============================================================================== - Initiate a build of "Component" - Read in the dependencies for "Component" and recurse for each dependency - Copy required resources to appropriate staging locations - Build "Component" - Deploy any artifacts ============================================================================== Specifically Building "Project Y" ============================================================================== Now, to specifically build "Project Y" above: - Check out the source for "Project Y" - Read (or calculate) the dependency info for "Project Y" (can possibly perform the above two steps in the other order) Result: Include Set B Library X Tool A Tool B FirmwareBuilder - Check out "Include Set B" component - Read (or calculate) dependency info for "Include Set B" Result: No dependencies - (Optionally) publish "include1.inc", "include2.inc", and "include3.inc" somewhere - Check out "Library X" component - Read (or calculate) dependency info for "Library X" Result: Include Set B Tool A Tool B FirmwareBuilder - Check out "Include Set B" component - BUT WAIT! Immediately realize that "Include Set B" is already checked out and move on - Check out "Tool A" component - Read (or calculate) dependency info for "Tool A" Result: No dependencies - Copy required resources and set up environment for build - Call dmake to build "Tool A" - Publish "toolA.exe" somewhere - Check out "Tool B" component - Read (or calculate) dependency info for "Tool B" Result: No dependencies - Copy required resources and set up environment for build - Call dmake to build "Tool B" - Publish "toolB.exe" somewhere - Check out "FirmwareBuilder" component - Read (or calculate) dependency info for "FirmwareBuilder" Result: No dependencies - Copy required resources and set up environment for build - Call PDE builder to build "FirmwareBuilder" (What about getting a PDE basebuilder package? This could be another dependency level?) - Publish "featuresAndPlugins.zip" somewhere - Copy required resources and set up environment for build - Call firmware builder to build "Library X" (What about getting an eclipse runtime package? This could be another dependency level?) - Publish "Library X" somewhere - Check out "Tool A" component - BUT WAIT! Immediately realize that "Tool A" is already checked out and built and move on - Check out "Tool B" component - BUT WAIT! Immediately realize that "Tool B" is already checked out and built and move on - Check out "FirmwareBuilder" component - BUT WAIT! Immediately realize that "FirmwareBuilder" is already checked out and built and move on - Copy required resources and set up environment for build - Call firmware builder to build "Project Y" (What about getting an eclipse runtime package? This could be another dependency level?) - Publish "projectY.abs" somewhere ============================================================================== Key Requirements from Above Example ============================================================================== - Ability to realize that a component is available (checked out) and available for components that have common dependencies. This involves the ability to cache component artifacts into a common cache. Component artifacts can be simple source files or built binaries. - Ability to "publish" artifacts to a repository for two reasons: 1) So that the build system can avoid rebuilding artifacts that do not need to be rebuilt from source. 2) We need to be able to guarantee that we are building from binary-identical components. This includes all toolchains, firmware (libraries) and sample code. This implies that we need a "versioning" mechanism so previous versions (preserving timestamps) can be maintained and used to build previous versions of a product. This is currently done with CVS and is a pain to maintain manually. - We need to be able to reproduce *exactly* an old build from tagged components and source. This means we need to be able to go to the build server and basically trigger a build from a tag. Can any of the current CI server tools do this? ============================================================================== $10,000 Questions ============================================================================== - What about "wrapper plugins" that package platform-dependent, native libraries into Eclipse? Ideally, they should not have static .jars and .dlls checked into CVS, but should be built (really "assembled") at build-time, or pull them from a central, binary respository somewhere. To do this *properly* we should have these plugins depend on the projects that build these artifacts and build them as required. This would make setting up a development workspace a little harder and will require these extra steps to be somehow scripted for the user. - Can we achieve all of this through the CI server (i.e. setting up project dependencies/triggers in the build server itself), OR do we want to involve a separate dependency management tool like Ivy? - What do we publish? Every binary from every single build? Ideally, components whose source and dependencies have not changed will not have changed, so nothing should be rebuilt or published unless it has actually changed. - Do *any* of the dependency tracking alternatives (Ivy, Maven2, or even project dependencies within the CI server itself) support the notion of building from the "latest" version of something, where the "latest" version means the last binary in the artifact repository, OR rebuilding from source and checking in a new binary if the dependencies of a component have changed. I think Ivy can do something like this but we need to set up some test projects. If we do implement this "latest version" thing, how would we then build an old, tagged version? If all the scripts are pointing to the "latest version" I don't see how this could work unless the "latest version" algorithm was basically the "latest version with respect the tag you are building from". I guess this would have to be a date/time-based thing, OR we would have to maintain a list of tags like we do now and always refer to static versions in our dependencies. This would be a nightmare to maintain. Whew. Sorry for all the info, but I am interested in hearing and recommendations. I'll attach a block diagram if I can figure out how to do it here (if that is even possible). A simpel block diagram of the workflow mentioned in the description.
Hi Mark,
Wow, thanks for providing such a detailed overview of your requirements. This will certainly be useful for when we add further dependency management features to Pulse. I have read through a couple of times, and will no doubt do so again in the future. Some initial comments that come to mind: * It is great to hear that you are looking to commit to Pulse, but I would also say that it is to your benefit to design something that does not rely directly on Pulse any more than is necessary. As a vendor we do not try to lock people in, so whatever dependency support comes in Pulse will be aimed to play nice with the outside world. The primary reason for this is that a developer building on their own machine should have all the power of the dependency management they need. * I really think you should take a look at Ivy. Despite the learning curve, it solves a lot of the issues you mention. We use it here, and although I am hardly an expert in it I can say it has done us well so far. It gives you a way declare and resolve dependencies, along with the publishing and delivery of artifacts. It understands dependencies on the "latest", and can also fix versions in delivered dependency descriptors. The downsides we have found are sometimes confusing caching behaviour (they are looking to improve this), and the need to create dependency descriptors for third party libraries (as the public repositories are far from complete). The latter is sometimes a blessing in that we are not bound to possibly incorrect descriptors as is sometimes the case in the Maven world. You could also consider Maven although I think it will be difficult to retrofit to your existing projects, and maybe not as flexible as you need (without considerable hair-pulling). The good thing about Ivy is it sticks to one thing: dependency management. * To build against the "latest" dependencies, I would consider if you really need to "build on demand". That is, do you need the requiring project's build to trigger a build of a dependency if it is out of date? The alternative is to build the required project on every change so that the latest is always built and available as artifacts in your repository. I think this latter method, if suitable, will simplify the build system considerably. This is also a place where Pulse can help, as it can schedule the builds on every change and make sure the artifacts are always up to date. * If you have dependencies that are bound tightly enough that "build on demand" is necessary, you might consider versioning them all together. You lose granularity, but this can also simplify things greatly. * Regarding building releases from a tag vs promoting existing artifacts: these are both legitimate ways to do releases (provided your build system is robust enough to ensure a rebuild gives the same results) each with their own pros and cons. The rebuild model is usually simpler to retrofit to an existing system, and is easy to get working. The promotion model appeals as you can take existing built and tested artifacts and release them without fear of a difference in the rebuild and in less time. Promotion requires that every build be "releasable", i.e. there should not be special pre-build steps for release builds (though post-build steps can happen at promotion time). This is a good goal to aim for in any case. We aim to add some more promotion features to Pulse, and are discussing requirements with a few customers. * Reproducibility is another important goal. This can be achieved by ensuring all source material is tagged in the SCM on each release, and all dependency information is recorded either in the SCM again or in another repository. This presumes all dependency information is stored explicitly in descriptors (like Ivy files) that can be stored appropriately. Where things normally depend on the "latest", you should fix the depedency versions at release time and deliver a descriptor with the fixed versions to your permanent repository. * Build a patch is a lot easier when you have proper reproducibility. The question then is dealing with what changes you have to make, which is quite dependent on your SCM usage (branching/tagging). * Regarding how all this fits with Pulse, it depends if you are going to use an external tool to manage publishing and delivery of artifacts and dependency descriptors. If you do, you can get pretty much everything you absolutely need from Pulse today. That is, you can use Pulse to trigger builds on every change to keep artifacts up to date, and you can trigger builds of projects when there dependencies change with build completed triggers. This is our preferred solution, but that does not mean we will not be adding more functionality into Pulse in this area. As I say, you can get everything you really need, but there are still ways we can improve and simplify. We will still be looking to simplify the integration with Pulse, initially around how artifacts are promoted for release and how dependent artifacts are delivered between projects. We also would like to keep the dependency configuration in Pulse as declarative as possible, purely for simplicity. That's my main thoughts for now. I hope some of it proves useful, and look forward to continuing the discussion. Hi Jason,
It has been a long time since I posted to this bug and things have become "less muddy" since then. I think after much thought and deliberation with myself - ;) - that keeping things built and up to date makes more sense than building on demand. I am a little worried about wiring up all of my dependant projects, but we'll see how it goes I guess. If it gets completely out of hand I guess I could just employ some scripting via the remote API. I am interested in finding out more about how you may have set up Ivy. Most importantly, I am a little fuzzy on how you keep track of revisions. I assume a revision is calculated at build time and filled in for "latest.<whatever>". This then gets persisted in the Ivy file in the Ivy repository. But where this seems to fall apart for me is when you want to go back and reproduce a build from 3 weeks ago. So you check out source tagged with a build number from 3 weeks ago and build it. But if all of your build scripts include in them dependency declarations of "latest.<whatever>", won't you get the version of all of your dependencies you just built from the Ivy repository instead of the versions you actually *want* from 3 weeks ago? This is the part I have never understood about Ivy - the "going back in time part". Feel free to take this offline and email me directly. I am interested in how you manage the fact that it can be configured with so many repositories and caches, and whether you use configurations or not. Also, how do you manage backups? It seems there is no rollback mechanism if something gets blown away - you'd better have a good backup system if you are worried about reproducability or you'll lose all of your versioned artifacts. Trying to clean things up a little. We currently have a couple of very large buckets of tasks that are really not indications of our scheduling. What I am doing is for starters, moving all of the 2.x. items into x.x to indicate that they have not been scheduled. We can later go through these and pick up with ones that we intend to complete for an upcoming release.
Marking as resolved after significant work and rework leading to a nice solution for delivering captured artifacts from one build to another. Several other pieces of the dependency puzzle have also been solved, and there are many more, but for those remaining more targeted issues should be raised.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
I think another key feature here is to provide the ability to look for version "X" of an artifact, and if it isn't there, build it from source using "Project X" (recursing down to transitive dependencies).