During late spring 2009, a digital crime wave spread across the internet.
A collection of more than 70 different programs infected the personal computers of unwary users, silently turning control of their systems over to cybercriminals.
Undetected by most antivirus programs until mid-July of that year, the malicious software (aka malware) compromised systems located in more than 190 countries and half of the Fortune 100. Many of the programs were very similar; others completely different. The one common theme among the different variants: Once they infected a user’s system, the programs would send a message to a server in Spain and wait for commands.
Such collections of compromised computers, known as botnets, have become a common tool of online criminals. But this one, the Mariposa botnet was exceptional: The malware spreading the cybercriminals’ influence had taken control of more than 1 million computers, with the attackers eventually creating more than 4,000 different variations to dodge antivirus defenses.
“It would be easier for me to provide a list of the Fortune 100 companies that weren’t compromised, rather than the long list of those who were,” Christopher Davis, CEO of Defense Intelligence, the security firm that originally detected the Mariposa botnet, said at the time.
By late summer, researchers at Georgia Tech had partnered with Defense Intelligence and Spanish antivirus firm PandaLabs to hunt down the group of criminals behind the Mariposa botnet. The academic researchers and private security professionals, known as the Mariposa Working Group, studied the attackers and, in a bold move, took control of the central servers used to manage the botnet. That action—in combination with subsequent mistakes made by the botnet’s operators at first only known as the DDP Team—enabled researchers to discover their identities.
Three men, who ranged in age from 25 to 31 years old, had started the botnet as a hobby, but by the end they were making 3,000 euros each month renting out the compromised systems to other criminals. The Mariposa working group’s efforts ultimately led to the arrest of the botnet’s operators in Spain in early 2010. It turned out the three had bought the software to build Mariposa on the internet, not built it.
Because Spain has lax cybercrime laws, none of the men went to jail.
“Going after botnet operators manually is time-consuming and exhausting,” says Paul Royal, a research scientist at the Georgia Tech Information Security Center and one of the members of the GTISC team that investigated Mariposa. “Think about all the effort that was put into the Mariposa takedown and discovering the identities of the operators, for just for three guys who didn’t even have the expertise to create the software they used to build the botnet. Something else has to be done.”
*
The Mariposa case is notable not only for its scope, but also for the fact that an investigation into it resulted in the identification of suspects and their subsequent arrests. For more than two decades, online criminals, spies and “hacktivists” have employed viruses, trojan horses and other means of attack to infect consumers’ computers, compromise businesses’ networks and steal secrets from government systems. The seesaw battle between attackers and defenders has been peppered with the occasional arrest, botnet shutdown or lawsuit against the attackers. But on the whole, the criminals continue to win, stealing money and secrets while remaining out of law enforcement’s reach.
The security industry has shifted its efforts from trying to keep attackers at bay to classifying and identifying attacks once they have already happened. Companies used to build walls around their networks to try and keep out the digital barbarians; today, security teams are increasingly focused on what to do when attackers inevitably break through the gates. “We know prevention is not possible,” says Manos Antonkakis, PhD CS 12, chief scientist at network security firm Damballa. “You cannot guarantee prevention in any way in network defense, so we need to get familiar with the notion that we are already in a state of compromise.”
The list of companies and government agencies that have had data stolen by hackers is lengthy and growing all the time: Google, Lexis-Nexis, Lockheed Martin, Adobe, Sony, LinkedIn, the South Carolina Department of Revenue, Living Social and the U.S. Department of Veterans Affairs have all lost sensitive business data to attackers. As online thieves and spies—whether private citizens or under the employ of a nation-state—become more and more advanced, hardly a week goes by without news of another data breach.
But as the threats grow, researchers at Georgia Tech are working to arm individuals, companies and governments against them. Through a number of projects and research initiatives at the Georgia Tech Information Security Center and the Georgia Tech Research Institute, many of them classified, researchers are building better ways to protect data and networks and root out attackers. Because what’s at stake isn’t just sensitive data, money or state secrets—time gets stolen, too.
“The real problem we have today is that the attacker can spend five minutes and make me or someone else spend days in analysis time on something that they send our way,” says Chris Smoak, CS 04, MS IS 12, a GTRI research scientist. “I want to get to the point where if they spend five minutes, I only have to spend five minutes or less.”
Going forward, the success of a security company will be measured not by how many attackers it keeps out, but by how quickly their technology is able to detect a system compromise, how quickly they respond and whether their efforts ultimately hold the attackers at bay.
One Tech-led security effort, called Apiary, analyzes malicious software to give researchers information about the malware’s capabilities and associate the attack with similar programs already identified. Apiary came about in 2010, when Georgia Tech consolidated three network security labs into a single group, the Cyber Technology and Information Security Laboratory (CTISL). The new mega-lab, which operates under the umbrella of GTRI, started with 85 people and about $24 million in research grants; those numbers have since doubled. Its projects now include making networks more resilient to attacks, automating malware analysis and gathering data on attackers.
Apiary also helps companies work together for their collective defense, sharing sensitive information in an anonymous environment—and this may be just as important as cutting-edge research, says Bo Rotolini, director of the CTISL. “Our goal is to stand up a community that uses the system and adds collective intelligence,” he says. “The system is better the more people participate.”
Apiary gives defenders more data on the tools and techniques used by the many attackers trying to get into company networks. To do that, it uses virtual machines running in a large computing cluster to crunch through potentially malicious programs. The system then compares the analyzed malware with some 65 million samples contained in Apiary’s “digital zoo” of malicious code, looking for a match or at least trying to determine if the software could be related to an already-known malicious program.
“We can say, ‘You know what, we’ve never seen this before, but it sure is doing some really bad stuff, so you better go isolate this thing, because you are being targeted specifically,’” Rotolini says.
As the number of participants in the Apiary project grows—it has already added defense contractors, oil and gas companies, and other academic institutions to its roster—the network effect of crowd-sourced intelligence will kick in and allow members to form a better picture of what malicious activity is occurring. This is important, because while U.S. government agencies (like the FBI and Department of Homeland Security) frequently ask for information on attacks, they less often share the same data in a timely manner—and they’re not alone. Aside from some industry-specific sharing initiatives, most companies do not share data on attackers that could help others detect or prevent a compromise. In what plays out as a variant of the prisoner’s dilemma, each company would rather face its attacks alone than admit that someone had breached their network.
“It’s embarrassing for them; it’s like airing your dirty laundry,” says GTRI’s Smoak. “So what often times they would do is not mention an attack or they try to go only to law enforcement, but not tell other organizations that are [also] likely to be attacked.”
While U.S. companies and government agencies spend billions of dollars each year protecting their networks and data, they remain a few steps behind the most advanced attackers—in part because of this hesitation to cooperate with one another. But sharing so-called “indicators of compromise” (including which codes are used by attackers, which software vulnerabilities are being exploited, which Internet servers attacks are coming from) could help defenders respond more effectively.
The legal and technical hurdles to sharing attack information are not easily overcome, either—firms fear that admitting a compromise could open them to lawsuits and deplete customer confidence. Rotolini understands. “We are all against the bad guys in this battle, but I don’t necessarily want the whole world to know that I have an infected network,” he says. “You have to be able to share that information in an automated way, anonymously, and that is what the Apiary system does.”
Apiary is far from the only project being pioneered at the CTISL. The Network Vulnerability Division focuses on reverse-engineering a variety of technologies such as routers, mobile devices and wireless hardware, an activity that can be both defensive and attack-oriented: Researchers can either find ways to better defend those technologies or spot vulnerabilities that could be used to attack the same technologies. Other CTISL researchers focus on creating software that is both resilient to attack and can be more easily checked for the common software vulnerabilities that so often open applications up to compromise. The Spider Sense project is designing self-healing networks that resist efforts by attackers to more fully compromise systems on the network; if the defenders can detect the attack before attackers expand their foothold, they have a better chance of preventing the intruder from gaining any valuable information.
The goal of all this research, says CTISL director Rotolini, is automated defenses that can react to an attack and repair the damage—much like the human body’s immune system. “You should be able to detect and isolate and remove and clean up an infection automatically,” he says.
But getting to a future of automated analysis and defense will rely on gathering more data from firms, and with more data comes more danger—the systems currently in use can easily be overwhelmed with too much information. To combat this, systems like Apiary use correlation and statistical analysis techniques that resemble the means by which Google processes the immense amounts of Web data to create its search results. Using Google-like analysis to find attacks in the fog of security data will be the only way that companies can stay ahead of attackers that can come from anywhere and attack any internet-connected device, Damballa’s Antonkakis says. He has created three systems—Kopis, Notos and Pleiades—that use big-data analysis techniques to collect domain name system queries from customers’ networks and analyze them for signs of malware communications. “We have to change the status quo of the game here,” he says, “because the attacker always has the first move.”
While the use of Google-like techniques—splitting up large data sets among a cluster of machines that process data in parallel—for mining information has gained a lot of attention in the security industry, there are some drawbacks. Systems reliant upon “big data” require much more data to create a model of “good” behavior on a given network, so anomalous activity becomes much more apparent. And while the techniques can result in faster detection of threats, they still require a lot of expertise to implement, Antonkakis says. “You start with a natural handicap, because you don’t know how the attacker will exploit a vulnerability,” he says. “Being a researcher, every day you come across something new, and you are asked to come up with a defense for it. … Effectively it is hand-to-hand combat.”
In the future, Smoak and other Georgia Tech researchers hope to move from parsing the big picture to being able to forecast where attackers may strike next. Already there are some ideas as to how this could be done: If an attacker is known to hit software firms and then move onto defense contractors, that could enable the system to do some predictive analysis. Negative comments on social networks could correlate with subsequent attacks on the organizations against whom the comments were directed.
If Apiary had more companies involved in 2009, for example, the warning signs of Mariposa botnet infections might have been more easily communicated to each company involved. Rather than wind up with more than half of the Fortune 100 infected, a single compromised entity could have tipped off the entire corporate security community to the looming threat. But to even attempt such analysis, Smoak says, researchers need an order of magnitude more data than what they’re currently afforded.
“If we get all sorts of data and all sorts of interactions, we should be able to start forecasting things,” he says. “Will that information better help us better protect ourselves as a community in the future? That is where we are going to go.”











A great article it has really enlightened me. Unfortunately my accounts have been hacked into on numerous occasions from Moscow to Las Vegas but I’ve learned many cyber-life lessons. Thanks to those of you with the knowledge, fortitude and persistence in helping us keep our information safe.
I know it’s a challenge.