This week I’ve been helping some developers with unit testing adoption which raise an interesting topic that I’ve not seen explicitly addressed.
Unit testing is great when you are working with simple data but what if you have larger or more complex data such as waveforms or images?
I’ve used a couple of techniques over the years:
Maybe this is an obvious one – but the first option is to identify whether the method can still be applicable on a subset of the real data.
For example, on an application where I’m doing some image processing, the real data will be 256×256 pixels
However, my tests run over 3×3.
This is still applicable to the larger image as many algorithms involve handling edge conditions and the normal implementation in the middle. The larger arrays will have more of the normal implementation but the smaller tests will still cover the edge conditions as well (which is often where things can go wrong!).
In some cases, you need a full input set but you are just measuring some properties.
An example might be a frequency processing function where we want to extract the size of the peak at 1kHz.
The FFT parameters change a lot based on the size of the data you put in so really we want to use the actual size we expect in the application. So instead what I have done in the past is to write a basic routine to generate an appropriate signal.
In the example above I use generators to produce a multitone signal, perform the frequency analysis and manipulation which I am testing and then compare just the components of interest.
(Note: this is before I got a bit more structured with my testing!)
Credit to Piotr Demski at Sparkflow for pointing out an important point I missed when I first published this. If you are generating data – it should be the same every time i.e. beware random data sources. Your tests may fail without anything changing.
The approaches above may not work if it isn’t obvious how to generate the data or you can’t generate the important elements easy. They also only work where the problem is importing data – but what if you need to compare a large result.
Here I will revert to storing a constant of some reference data. Normally by running the system, using a probe on the data and copying it to my test.
On the input, this can work without any compromise.
For expected cases there is an obvious catch – if you generate data from the VI under test it will of course pass. However, if you have no way of generating an expected output then we have to compromise.
Instead, we can write the algorithm until it works (validated manually) and capture that data for the test. It means the test doesn’t guarantee it is right, however, it will fail if you do anything that alters the output, which gives you a chance to catch potential regressions.
Another path is to use a known good algorithm (perhaps from another language) to capture the expected output in a similar way. This is fantastic when you are looking to replace or replicate legacy systems since you can take data directly from them.
Catch It At Different Levels
In some cases, it may simply not be viable to create a unit test for the algorithm. That’s OK – 100% coverage isn’t a useful goal in most cases anyway. Instead, consider:
Try and unit test the components inside the algorithm (depending on the structure of your code). This will provide some confidence.
Make sure this algorithm is verified in system or integration level tests. Ideally, find ways to make it fail loudly so it isn’t likely to be missed.
I hope that gives you some ideas for your use case. Feel free to post any questions you have.
I’ve been thinking a lot lately about tooling. I’ve been getting my CI system running smoothly (ish, more on that later) as well as exploring another new language and working on open source projects on GitHub.
I’ve realised that there are specific penalties that we have to face as LabVIEW developers that are largely down to the graphical nature of LabVIEW.
In my head, I think of this as the graphical tax and thought it would be interesting to put these out there, see if there are solutions I’m missing or at least let’s have a discussion.
Software Engineering Tools
This one is an easy place to start.
Most software engineering tools assume a text-based system. Git works incredibly well on text. Github can merge requests in the browser on text-based files. CI tools use a text-based interface (the command line) to automate tasks like building, testing and linting your code.
As LabVIEW developers, we miss out on a lot here:
I’ve tried to plug the gap to CI tools with G CLI ( https://github.com/JamesMc86/G-CLI) but it is far from seamless as LabVIEW still has many cases where it pops dialogues.
Git etc. do work but don’t understand the files so manual merge and diffs all round
Any tooling for the language has to start from scratch. We can’t leverage what already exists.
This one is frustrating as software engineering marches on – we have to be careful not to get left behind. NXG promises to help a little with a human-readable VI format but I’m not yet convinced it will solve many problems (though some I’m sure).
This is one that I think is really fundamental and just something we have to accept.
Compared to LabVIEW, text-based style seems simple! spaces vs tabs, newlines or not, camel-case or snake-case. There are certainly huge debates but it is essentially a 1D problem. It’s also easier to manage with linters and code analysis tools.
With LabVIEW, we take that problem and add new dimensions. Style is literally 2D. There is so much room for variation. So many mitigating circumstances for given diagrams that having a common set of skills is hard.
With text, you can add your own visual style on top with your editor while keeping the code “pure”. With LabVIEW, if you want different colours for your background it saves into the code.
A great example was a pull request I got on GitHub. After downloading the code to my machine many of the comments were too small. When I requested they were made larger to fit the text, they sent back a screenshot showing it fits on their screen.
This is an area that interests me in understanding more. I’m not sure if it is hopeless or we just need to be stricter. It certainly feels like a barrier to more collaboration and so something that should be considered.
Web Tools for Review
I think this is going to get harder and harder over time.
Increasingly the web is the centre of our working world. Text translates to that world well.
Review is also getting bigger. The place this really hurts for me is in pull requests on GitHub. When I receive a pull request for the C# part of G CLI it can be a 5-minute process. I can review the diffs in GitHub and understand what is going on.
With LabVIEW, you are forced to pull the project down to the desktop to open the IDE and view it. So when, like as I write this, I’m not on my main dev machine I may not be able to provide any comment on what is happening.
This could be improved by having sites with plugins for LabVIEW that can generate code images and diffs. I know some work has been done on this in Jenkins so it can be improved. But it will never be as straightforward as commenting on text code (line by line comments are handly).
So why pay the graphical tax? Or:
If your just going to moan about it James then use text based languages
Well, there are still some hugely compelling reasons:
I believe it is easier to understand program structure graphically which greatly speeds up development and debugging. At least I’m a very graphical thinker.
There are no other tools that allow you to target desktop, real-time and FPGA systems with very little difference in syntax and concepts.
Even on a single target – there are not many other languages that combine a fairly high level of abstraction and the level of performance that LabVIEW provides.
So I still think I’m a net gain for using LabVIEW. I have no plans to jump ship any time soon. But maybe, by sharing my concerns, I might trigger some thoughts on how we can tackle these issues.
We have long known that VI Analyzer is a good idea – much like unit testing – people on the other side of adoption swear by it.
We’ve found a few hurdles for mainstream adoption into our process, and I suspect yours too.
1. Understanding Why
The first step in adopting VI Analyzer (and keeping that adoption going) is understanding why you are doing it!
What I mean by this is you need to make the tool fit your process – not the other way round. If you use VI Analyzer because NI says so then you’re going to see less benefit and it will feel a lot more effort.
For me – I believe that consistent style and code inspection reduces bugs and improves readability of the system. I have a style guideline, but I don’t always follow it. As I typically work on my own, then code reviews aren’t an option.
Your “why” will probably be very similar but the subtle differences will make some of the following items slightly different from mine.
2. Complicated Setup
As I said above, VI Analyzer is all about consistency for us. We want every project to follow the same style guidelines. Unfortunately, VI Analyzer does not make it very natural to create a standard test configuration and share it between projects since there is a single configuration file for test setup and which VIs to test.
These are the steps that I went through to build a standard configuration:
Start with your style guidelines. I made mine into a table and identified what already had a VI Analyzer test, what had a custom test in the community, what could have a custom test and what was not testable.
I downloaded the custom tests that are available online and created a couple of key tests myself. I didn’t do them all, and I will expand my coverage over time.
I used a project to create a VI Analyzer configuration file. I worked through the tests and configured them to my style guideline. Then I removed all of the VIs to be tested. I saved this configuration file as my standard.
I created a VI package which would install the custom tests and the configuration file to a shared location in my documents. Full credit to Fabiola De La Cueva and Delacor for this idea. They have been helping their customers with this for some time. (You can see their video of this on Youtube)
By completing these steps up front, I can introduce a consistent VI Analyzer configuration to a new project quickly and easily.
3. Defining A Trigger
As with any good habit, you need a trigger to tell you when to do it else you often won’t follow through. There are a few common triggers that I have seen people using:
Post-Commit in a CI Server – I’m not a big fan of this one because you now need another trigger to review the results and implement the changes.
Pre-Code Review – This is a great one if you are on a team. You should test with VI Analyzer before a code review. You don’t want to waste time picking up things a machine can identify faster. As I am a solo developer, this one is limited for me.
Feature Branch Merge – If you are using a branching workflow in Git or Mercurial, then you can use a feature branch merge as a trigger. This is a good trigger as it should limit the scope of what needs testing. However, if the change list gets long, then there could be a lot to review.
I wanted my trigger to be a commit. I feel like the code should go through VI Analyzer to be “finished” and only “finished” code should be committed.
This is a problem though. Analyzing a whole project can take many minutes, and I might commit 10+ times a day.
One way to solve this would be to test only changed code, but VI Analyzer lacks any native support for this workflow.
I have developed a tool to tie it to git changes. It isn’t great, so we haven’t shared it yet (it still lacks many essential features), but it has started us using VI Analyzer regularly.
This tool will take a configuration file and then run it against only the code that Git shows as changed. Testing only the changes cuts the test time and now means that it is possible to check at each commit and fix the changes before that commit.
By overcoming these hurdles, VI Analyzer has become a standard part of our workflow. I hope you can use this to incorporate it as well.
I don’t write code in the best way the first time. No-one does. Instead, if we want to get to a state of clean code (readable, maintainable etc.), we often have to put specific effort into it.
Martin Fowler’s Refactoring book summarises it nicely when talking about refactoring. You need two hats. One for adding functionality and the other for refactoring. Your process may have a nice point to change hats (I use TDD which does) – or you may have to be more deliberate. But at some point, you need to think about refactoring.
I’m not going to go into loads of detail on refactoring now. Instead, I want to focus on one type of refactoring that I use a lot – creating more subVIs.
Why Refactor Code to have more subVIs?
There are several reasons why code with more, smaller subVIs will tend to be easier to work with than a flat VI:
The diagram is smaller – The simplest of them all – your diagram now fits on the screen!
Improved readability through abstraction – This can be a contentious point. We are putting the detailed implementation another click of the mouse away which some people dislike. My experience is that this is outweighed by how much easier the code is to read. By creating a well-named, cohesive subVI, the calling VI is faster to read since you don’t have to worry about the details. You can worry about the problem it is solving and dig into the details if/when you have to. There are also testing and debugging advantages since the new subVI can be debugged in isolation.
Clear and Obvious Coupling – Coupling is one of the primary concepts in software design that you need to grow to understand. A flat diagram can easily hide coupling in the noise, but once you create subVIs, it is obvious if a subVI has too many inputs or inputs that you wouldn’t expect it to have. These are both signs of coupling problems.
Where to start?
I signed up for a presentation at NI Week to talk about clean code and needed to talk about this topic. The day before planning to write the subVI section, I was working on a customer project. As I have a young baby at home, I try to leave at 5 pm every night now, but it was getting on for 5.40pm, and my Wife was messaging me to see where I was so I got the code running but never put on my refactoring hat.
When I came to write my presentation, I realised that stepping through the refactoring of this code was the perfect example! The code I abandoned that evening is the code in the steps below.
Look for Commented Sections
The first and simplest clue, is to look for sections of code with comments describing what they do. If it is enough of an important chunk to comment, it is probably cohesive enough to make a good subVI.
See the sections for formatting at the bottom? There are three types of data to be formatted for the table. One is already in a subVI but the other two are just labelled as doing the conversion. This is the classic sign of a subVI that hasn’t been created yet.
The first step here is to turn these all into subVIs.
I’ve created the subVIs, aligned them and given them a consistent naming convention.
Different Levels of Abstraction
I’m still looking for a better term for this, but there is still a smell in the code above for me.
Most of the code is now subVIs doing non-trivial functions. Loading from a log, generating tables. But there is a section on the left that is doing array manipulation. This is a problem because it is too much detail for what this code describes. This code is supposed to “say” load the data from the log, convert the sections to a table and combine them all. But instead this story includes handling memory allocation as well!
So my next step is to abstract this into a more descriptive subVI.
I hope you agree; this code is now much more straightforward to read. It is doing everything it was before (as efficiently as it was) but as developers, we can become faster at understanding the code.
Did you notice the coupling?
This highlighted a nice reason for this refactoring. Why is a build table function using an array of timestamps?
We sucked these in as part of the “create subVI” process, but they don’t belong. We can refactor the timestamp array out but using the rows in the time table instead to remove the input.
This is a somewhat trivial example, but by abstracting out the subVIs, we can now understand our code in a way that is clearer than before and better visualise the coupling between functional components.
Are we done?
Naturally there is no “perfect” state. This was the stage I stopped at since I had removed the immediate “smells” and the code worked well. Now the code structure is clearer though it actually highlights some other things we could consider:
The build table code could be generalised for reuse by removing some of the labelling. If I need this same function elsewhere, I may come and grab this code.
The program flow shown is that we generate each sub table before building a large table. It may have a higher performance to directly insert each of the subtables directly into the pre-allocated table. Right now the solution above is simpler, and I don’t have any performance concerns, so I haven’t done this.
I’m going to keep on about this – These potential changes are much more obvious in the new structure than the old – that is why those first steps are important. There is also a point where you have to say good enough, or there isn’t enough information to know what is the next best step (options 1 and 2 above may clash so we don’t do them until we know which we want).
A Note On Testing
Isn’t there a risk of breaking your code with refactoring? In theory refactoring should not make any functional changes to your code but we have all done things with unintended consequences!
Allowing low-risk refactoring is one of the ways that unit testing leads to better code. Because I have tests around this section of code, I could change it as much as I wanted with confidence that it still works. This is why we treat unit testing as a foundational principle at Wiresmith Technology.
A loop within an application component. It has a single reason to run.
Why Loops Run
By “run” what I mean is what triggers an iteration of that loop. I’ve identified these sources:
Time – Intended to run every x ms.
Event – Either a UI or User Event, the loop contains an event structure.
Data – Basically a queued message handler but “data” means anything that means a loop waits for data. A data interface could be queues, notifiers, streams, DAQmx etc. Some external process is forcing us to wait for new data to be available.
The rule means, if one loop is trying to use two of these methods (or multiples of one of these types) then problems may occur. Examples might be:
Using the timeout on an event structure (did you know it won’t fire if user events fire, even if you aren’t handling them? Thanks Chris R for this demo!) Thanks Fab for the correction: This was a bug that has been fixed.
Using the timeout of a queue to perform a repetitive action.
As you can see – time is usually the conflict because the others are kind of obvious to avoid having in the same loop.
Why This Rule Exists
Put simply, in most cases where two of these exist there is a conflict which is hard to resolve consistently 100% of the time, mostly because the event and data drivers depend on external components which aren’t predictable.
Take the timeout case of the queued message handler. If you want to check a state about every 1 second, then the best case is a new external message once every 2 seconds. Then 1 second after that message arrives you perform the check. Maybe you will get a second check before the next message, maybe not. Most likely you will end up performing the check every 2 seconds. Perhaps that is acceptable?
However then an external component generates messages every 0.1 seconds. Now that check doesn’t happen ever, through no fault of yours! (In the context of the code in this loop).
But I Need To…
Perhaps there is a case where you want to break this rule. I have two solutions:
Use a second loop as a “proxy”. For example in the case above have a second loop which runs every 1 seconds and enqueues a message for the check.
Another example is when you have an API which generates messages on an event but you need it in a queue. If you have used the actor framework, this how you extend actor core for GUIs.
Break The Rule… And Be Very Careful
If you don’t see a way to design around this rule, then you need to know you are breaking it and be very careful. Most likely by adding some management code to a loop.
The example I could think of was if you needed data from two loops. In this case, you would write code to read from each queue. If one read timed out, perhaps you store the successful data and then try again, reading only the failed queue.
I don’t think there are many cases, but rules are about knowing when you break them and why!
The word of the year here at Wiresmith Technology is process. In areas where I have standardised processes life has got easier, less stressful and more reliable. Now I’m looking at the software processes to see where we can get the same benefits.
Something that I have wanted to address for a while is architecture. Working on my own has given me the benefit of being able to be quite ad-hoc and try different designs on different projects.
Well, I often think coming back to your own work after 6 months isn’t so different from working with someone else and I’ve certainly felt the drag of having to review how I built the architecture on each project. So I want to at least have some standard templates.
What I found when I came to it was I first had to define what is in a program!
Language Is Important
There are so many conflicting terms and every framework has its own terminology.
I really wanted to start with knowing what I want in a generic sense. By doing this without looking at specific frameworks it gives me the freedom to find a framework that fits the way I work best (as well as the freedom to change or not use frameworks depending on the project).
I’ve seen this approach work in my business. I’ve been trying to find tools to help me be more productive, but it isn’t until I decide on what the process is that these tools are supporting, I waste hours trying to choose as I have no way to determine what is best!
So before even trying to work with templates or frameworks, I reviewed my previous projects to try and pull out and name the different elements of my architectures so I can map other tools to this.
What I Picked
So here is what I listed as my architectural definitions. Before you read them, understand I am sharing these for your curiosity – you may have your own set of definitions in your team already. This isn’t about right or wrong, this is about consistency between team members and projects.
Not to be confused with…
An asynchronous VI with it’s own lifetime and own control of when to run. This is the top level of the architecture design.
A piece of engineering data e.g. acquired data.
Messages: We split this concept as messages are more framework-oriented.
A framework command for a process to do something.
Data: although they are data in the strictest sense, they are not directly related to the data involved in the engineering domain.
A process that receives heterogenous messages and data.
Data-driven process: this has homogeneous data to handle.
A set of related code. In our system, it is a class. It is generally unit testable.
Module (DQMH), Actor (AF) – These are processes or message handlers.
A loop within an application component. It has a single reason to run.
I’m pretty happy with this – the one element of confusion is where the actor style module (whether that is an actor framework actor or another QMH based framework like DQMH). In reality, this sits somewhere between a module and a process but I need to experiment more with how to think about those.
The one I think is particularly important is modules. Too often the important logic gets muddled and mixed with framework code
For me the next step now is to create templates or frameworks to handle these items in a consistent way – more on that in a future post.
My challenge to you is to think about this for yourself. Maybe you already have a framework so you don’t need definitions like this, but where do you find you or your time are inconsistent over time and would a common language help?
The APIs that you have to test are not always simple. As well as passing data they may involve events (with the front panel or with user events).
The other day I needed to test that an event fired as part of a test case. I could see a generic solution, so I created a template for it. I had two requirements:
If the event doesn’t fire – test fails.
If the event fires with the wrong data – test fails.
In my given when then sequence then we end up with a test that follows the structure:
Given: Who knows, in this case, a UI library has been tied to a control.
When: We take some action that should cause an event on that control.
Then: Check the event.
To check the event we create an event structure outside of a loop as we don’t want to handle multiple events. We need two cases:
A timeout case with a suitable timeout – In this case, we call the Test Case.lvclass:fail.vi to fail the test. This should never run if the when code fired the event.
A case that handles the event – If you don’t care about the data then you can do nothing here, otherwise, include tests on the data included in the event.
Dynamic Event Registration: If this is a user event then you will need to register for the event. I’ve included this in my template, but you must move the event registration to the given case. If you haven’t registered the event before the action in the when case, it won’t ever fire.
Parallel/Dynamic Event Generation: If your event is in some dynamic code you may need to have this running. My advice: DONT. Try and pull out the internal API and test synchronously. Asynchronous testing in LabVIEW introduces timing concerns which make your tests much more complicated.
When writing applications that will be used by anyone else you will need a configuration file. In my experience, this is almost universal and the more I make configurable, the more powerful the software becomes and the less small changes I have to make for my customers.
Where do we save config files in LabVIEW? The landscape is more complicated than you would think! In this post, I’m going to summarise what we do on our LabVIEW projects. We are focusing on Windows since RT is simpler (put it in /c/) and I don’t use Mac or Linux with LabVIEW.
Types of Config Data
I’m going to refer to two types of config data:
Global Data: No matter who logs into the system they should share the same configuration. In my experience, this covers the vast majority of industrial applications.
User Data: Configurations that should change depending on the user. This might be screen layouts for example.
Files or Registry?
Microsoft is actually quite keen that you put this data in the registry – that is what it is for. There is a Software folder in each top level folder where you should create your own Company/App folder structure and you can store settings as different variable types.
For user data, you can store it under HKEY_CURRENT_USER and for global data, you can store it under HKEY_LOCAL_MACHINE. In many ways it is a pretty nice solution to the problem, however, I’ve avoided it for 3 reasons:
Files are much easier for users to get, edit or send you. Whilst I don’t want them directly editing the files much it is great that when there is a problem they can send me a file or even a screenshot of the file (when it is readable) so I can understand their setup.
Files make save as… much easier if the user wants to be able to switch between configurations.
Files are universal. Although I don’t have much cross-platform code I like that I can create multi purpose configuration libraries that work on Windows or RT. Without this, I would have to have different code for the different platforms.
I am curious though about who is using this. Please leave a comment below and let me know why you like this and if I have anything wrong.
If Files, Where?
OK, so we have decided on files, where should we put them? Helpfully Microsoft has an article on this however 7 years on there are still issues!
User data is the easiest and where Microsoft’s advice still works. In each user folder, there is a hidden AppData folder. This is designed to hold user-based configuration files and so the user has full read/write access to this. It is just hidden to protect you from “users with initiative” as Fab puts it in this presentation! Within here you should create a folder structure with Company Name\App Name to follow the standard convention.
To get this path use the Get System Directory.vi with the User Application Data input.
Global data is where this gets messy. There is an equivalent folder to the user AppData folder for this purpose, but…
In XP all worked well. It was located under All Users\Application Data and all users had write access and software worked.
Then Windows 7 came and two changes occurred:
The location was changed to C:\ProgramData (A hidden folder)
Folders had restricted access. The creator/owner has write access but no-one else.
One use case for this is to install fixed configurations at installation time and this works well since everyone has read access. However, if you need to write these after installation you normally do not have access.
The solution to use this? You need to set the permissions as part of a post install step to allow all users to have write access to the relevant folders.
One day, I may sit down and get this set up automatically as a post install step. For now, I have too many concerns about managing failures of this causing extra support. My solution? Use the public documents folder.
I follow the same structure but in Public Documents instead of Public Application Data. So far I’m happy with this decision and I haven’t had any headaches due to this.
I would love to hear your thoughts. What do you do? Am I wrong?
The more I learn about cyber security, the more you realise how much it feels like we are on the back foot.
Fundamentally the issue is that the tactics and techniques used by hackers seem to move forward much faster that technology at large with many things we depend on having been designed before security was such a significant consideration.
WannaCry certainly brought this concerns to the forefront again, with legacy systems making the front page. The media scoffed at hospitals using Windows XP still, but in our industry, we know that it is not a simple job to keep complex and custom systems up to date. So what might this mean to the LabVIEW community?
Working with IT More
Antivirus and automatic updates can cause havoc with operational systems but as shown having insecure devices on the network can provide a weak link for exploitation. So while It can be a pain to work with on these systems, we must understand their wider concerns.
We probably need to develop some best practices for system updates – is there a way we can schedule updates to minimise impact? Or can we guarantee the system stays off the network, so it doesn’t risk spreading malicious software? Alternatively, can critical elements be run on LabVIEW RT which will likely require less frequent updates than desktop systems?
Stuxnet showed that you must also consider offline threats, USB sticks will continue to threaten offline systems and if users transfer data to and from systems with them, they must be educated about the risks of using un-vetted USB sticks.
Minimum System Access
I always think one of the best, and basic security practices is that of minimal access. If you don’t need the Web server, disable it. Firewalls should only allow access to required systems, and we have the option to install them to Linux RT targets now.
Critical to this is things like VI server remote access. This allows for arbitrary code execution which is a hackers dream! Make sure you turn it off if you don’t need it. If you do need it, make sure you protect it well.
If you have a multi-device system such as a test rack, then including a router which can provide an internal network with wider access but restrict the external network would be a sensible approach.
Minimum access also means only the required permissions for any given user. You should ideally never be running as an administrator as standard. I know it’s easier! But it also makes things much easier for malicious code. When you hit a permissions error, then make sure you give the standard user the permissions it requires. Using Linux trains you well in this and is one of the benefits of learning it. (I know Steve has found it worthwhile)
Examples of where these principles are important are the new Petya variant. The malware spreads through various means. This includes the SMB flaw that WannaCry used, but it will also then sniff the machine for administrator credentials. If it finds them, it will then use these to remotely access other systems that the account has access on, spreading further.
I also have it on my list to look more into the write filters on the Windows Embedded systems which mean that anything written to the disk is only temporary and every reboot brings it back to the original state. The system can still get infected, but it makes a recovery much easier.
Thinking About Recovery
One thing I have learnt over the past couple of years is a backup is only as good as the recovery. If a customer had a machine infected and was losing money while it was down, how fast could you recover it?
I take images of all RT systems, but I am considering whether Windows-based systems should also have an image taken and recovery disk creating on delivery. Then if a machine does get infected (and doesn’t store critical data that has to be recovered first), it can be up and running again in hours instead of days.
I know there are a lot more questions than answers there! But I think it is an interesting discussion to have and something I aim to improve on over time.
After my previous post about Learning LabVIEW OOP there were some comments on by reference vs. by value which often come up when talking about OOP. I think there are two reasons that these are tightly linked to conversations about OOP.
In “classical” OOP languages everything is by reference but in LabVIEW OOP is by value. This causes a clash when people have learned OOP from these languages.
We do more by reference work in non-OOP LabVIEW than we sometimes like to admit.
I have been thinking about the techniques and analogs to these lately anyway so this is a bit of a meaty article covering the options that I see for implementing these and some thoughts on how they fit into teaching OOP.
By-Reference vs. By-Value
Lets first define my interpretation of these items. I like to think of them at this level as how data behaves rather than definitions of implementation.
By-Value: If you take a wire/data in LabVIEW and change it then it changes only for that piece of code and the code that is dataflow dependent on it. There is also no way another piece of code can change the data on the wire. Branching the wire risks creating a copy.
By-Reference: The data of is stored in one memory location. When you make changes they might affect other components that don’t have a dataflow dependency and if you read it twice in a row another piece of code could have changed it.
(This may differ from a classic computer science definition but my main concern is the behavior I see, perhaps a different term is required but stay with me!)
These strongly overlap with data communication. Another way to think of this is that by-value is communication on a data wire where as by-reference is a tag based communication method (in some situations).
So Why Use By Reference In LabVIEW?
My preference is to always lead with by-value where it works. I think this is key to what makes LabVIEW a powerful language. The data on the wire is yours to do what you want with and you don’t have to worry about side effects when you are programming. I suspect it is one of the keys that makes LabVIEW much easier for people without a software background to pick it up.
There are perfectly valid reasons reasons to use by reference though either in spite of, or because of, these side effects. The following items are cases where I look to by-ref:
Shared Application Resources: This is not a great term but what I mean by this are resources such as an error handler or a system configuration where you want every piece of code singing from the same hymn sheet.
Hardware Resources: This is one of the most common – DAQmx and VISA already all run on a by reference API. If I have to add to the API (attach additional data) I will use a by-ref scheme so it continues to work as you expect.
Huge Data: If a data structure is a big proportion of your application memory you need to make sure you don’t copy it often. You see this in the IMAQ library where images are handled by reference (and the confusion it can cause!). This is the in spite of case – you don’t get any programming benefit but you have to use it due to the constraint of the system.
How To Do By Reference In LabVIEW
There are multiple techniques to get the behaviour I mentioned above. I have put down the key ones below; My favourite will be obvious!
The simplest and most dangerous technique due to race conditions. This isn’t a problem in single process languages but in LabVIEW it can get you into lots of trouble!
There is one case that I may use them which is for WORM (write once, read many) globals which can be useful for configuration data but I never use them with OOP.
Data Value References (DVRs) allow you create your own reference wire to any data type. You access the data through the in-place memory structure which protects the data – no other code can access it until you are finished with it. This is very important for preventing race conditions.
This is my preferred method for by-ref objects. I would create the class normally but change the standard methods to use DVRs of the class rather than the class itself. What I have found is:
Good – 100% scalable. Want 1/5/20/1000? Not a problem.
Good – The call is synchronous. Once the subVI completes the function it performs is complete.
Good – I have heard criticisms that this can lead to having more wires on the diagram. I think this is a good thing! Wires make it obvious what is coupled to what. Variables and AEs don’t make that obvious.
Good – The property nodes for objects support this with no extra code.
Bad – The boilerplate is tedious! Creating the references and the in place element structures with their weird error handling
Bad – References can be invalid at run time which is avoided with the FGV/AEs.
I’m not going to get into a debate on what term to use for these. What I mean is a non-reentrant VI which uses an uninitialised shift register to store data. Normally developers will have an enum input which defines what function is performed and/or the core FGV/AE is wrapped in another API to allow for an easier connector pane.
These are not traditionally referred to as a by-ref programming method but I would argue they are. The “reference” is which VI you are calling. That is what defines which data is modified or accessed and any changes can be seen anywhere else in the program.
When I started my company I started down the DVR route already, having experimented with both so I haven’t used this as extensively in big applications but the reason I decided not to start with them were:
No wires so coupling is somewhat hidden (you would have to view the VI hierarchy).
Bloated connector panes (though wrapping it in another API does help with this).
No scalability – you either have to write an addressing scheme into the FGV or maintain multiple versions of the VI.
That said these are the preferred methods of many. It is well understood by different developers and is simpler to create than DVRs and shares the benefits of being synchronous.
It is rare that I use these but one exception is actually when this is the exact behavior I want! If I want a singleton object (for my error handler for example) I create an FGV that stores an DVR/Queue Ref and on the first call it will initialise it. That way I get the same reference everywhere in my application.
Queued Message Handlers/Actors
This is a bit more of an unusual item to add here but it can be and I believe is used for the same cases as above.
Increasingly these are being used in systems like a module. You have an actor for each instrument you want to talk to for example and you enqueue commands for it to complete.
This mode of operation is very similar to a traditional by-ref model. The “reference” is the queue reference or actor reference which ties you back to the “data” stored in the shift registers of the QMH. The QMH loop protects access to the shared resource by processing messages one by one, protecting you from race conditions.
It isn’t exactly like the other options, the key difference is that it can operate independently as well as responding to other requests which can make them hugely more powerful.
There is a major added complexity with this though which is that they are asynchronous. This means two-way communications are difficult for example, what happens if there is an error in the QMH? Also you can’t understand the time from the block diagram. I find I have to create sequence diagrams in order to understand the program flow.
Also you can’t understand the time from the block diagram. I find I have to create sequence diagrams in order to understand the program flow.
This is the reason one of the core tenets of an actor based system is that it is a request and you can’t care when or how it gets done. This rule means you must design your system in order to avoid these complexities, but I think this is a hard rule to follow consistently!
For these reasons, I avoid using these unless I need it to run independently or there is multiple classes interacting within it. I tend to use these for “processes”. For example, a DAQ system where the data just gets published onto an event when it is ready.
So When Do You Teach It?
So coming back to the previous article, when do you teach someone about this?
I would argue that this doesn’t need to be intertwined with OOP. In many cases, if they aren’t new to LabVIEW they will already be using one of these techniques. Sticking with that is the simplest route.
If they are new I believe that the decision between these is probably an architecture decision as each has pros and cons in different scenarios. It is hard to teach them all at once so I would look at your typical architectures. Does one of these tend to form the backbone or default option? (In my case it is the DVRs). In which case I would start with that. You can teach the other methods as they are required. If you suddenly need all four at once you could even hide one method behind another to get a new developer on board quickly.
Let Me Have It!
This is bit of a work in progress in my thought process which the question above prompted. I’m pretty sure my terminology isn’t great but I feel the idea is solid so please comment below or find me on twitter or the NI forums. I find it very helpful to help me understand this better.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.