CPH:SEC CTF-Notes - Hackers Resources Galore
  • ADcheatsheet
  • Project tree
  • RTFnotes
  • WindowsRedTeamCheatSheet
  • How to pass the OSCP
  • Kerberos cheatsheet
  • Privilege Escalation & Post-Exploitation
  • Awesome-Hacking-Resources
    • Contribution Guidelines
    • Awesome Hacking Tools
  • Notes VA
    • Exploitation Cheat Sheet
    • Initial Enumeration
    • Linux Privilege Escalation
    • PenetrationTestingToolsCheatSheet
    • Web Enumeration
    • Windows Privilege Escalation
    • emailgrab
    • linux_priv_esc
    • openredirect
    • 24x7x365 SUPPORT http://www.captiongenerator.com/320492/Offsec-Student-Admins
    • oscp_playlist
    • Privilege Escalation & Post-Exploitation
    • smb_enum
    • whois-file-transfer
    • Windows / Linux Local Privilege Escalation Workshop
  • OSCP-Materials
  • SCADA PLC ICS Pentest PDFs
    • PLClinks
  • Web-CTF-Cheatsheet
  • audio
    • Decode SSTV
    • Spectrogram
  • binary-exploitation
    • Binary Exploitation with Buffer Overflow
    • Exploitation
    • Binary Exploitation with Format String
    • GOT Overriding
    • Exploitation
    • Binary Exploitation with Race Conditions
    • Binary Exploitation with Time Guessing
    • Exploitation
    • Binary Exploitation with ret2plt
    • Pwntools Cheat Sheet
    • Binary Exploitation
  • blockchain
    • Blockchain Pentesting
    • Interact with Ethereum using Foundry
    • Interact with Ethereum using Python
    • smart-contract
      • Ethereum Remix Extension in VS Code
      • Solidity Assembly
      • Create a Contract for Recovery Address
      • Explicit Conversion
      • Solidity Delegatecall Attack
      • DoS with Assembly Invalid Function
      • Solidity Overflow & Underflow
      • Create a Malicious Contract
      • Create a Malicious Contract for Destructing Contract
      • Create an Attack Contract
      • Solidity Storage Values Analysis
      • Inappropriate User Authorization
      • Web3.js Cheat Sheet
  • database
    • CouchDB Pentesting
    • InfluxDB Pentesting
    • MSSQL (Microsoft SQL) Pentesting
    • MongoDB Pentesting
    • MySQL Pentesting
    • Neo4j Pentesting
    • PostgreSQL Pentesting
    • Redis Pentesting
    • SQLite Pentesting
  • dns
    • DNS (Domain Name Systems) Pentesting
    • Enumeration
    • Subdomain Takeover
  • email
    • IMAP (Internet Message Access Protocol) Pentesting
    • POP (Post Office Protocol) Pentesting
    • Exploit DNS Zone Misconfiguration & BIND Config
    • SMTP (Simple Mail Transfer Protocol) Pentesting
  • game
    • FEN Manipulation
    • Minecraft Server Pentesting
    • WebAssembly Games Hacking
  • hardware
    • Firmware Analysis
    • Gerber Viewer
    • MQTT Pentesting
    • NETGEAR Pentesting
    • SAL Logic Analysis
    • linux
      • archive
        • 7z
        • Bzip2 & Bunzip2
        • Crack Zip Password
        • Gzip & Gunzip
        • Tar
        • Zip & Unzip
      • attack
        • Warning
      • backup
        • Extract Archives
      • container
        • Basic Flow
      • management
        • Add & Delete Groups in Linux
        • Add & Delete Users in Linux
        • File Transfer in Linux
        • File & Directory Ownership in Linux
        • File & Directory Permission in Linux
        • Shell Scripting
      • post-exploitation
        • Linux Backdoors
        • Linux Pivoting
      • privilege-escalation
        • Ansible Playbook Privilege Escalation
        • Apache Conf Privilege Escalation
        • Bash eq Privilege Escalation
        • Buffer Overflow Privilege Escalation
        • Chrome Remote Debugger Pentesting
        • Exploitation
        • Command Execution
        • Extract Passwords from Firefox Profile
        • OpenSSL Privilege Escalation
        • Exploitation
        • PolKit Privilege Escalation
        • Python Eval Code Execution
        • Python Jails Escape
        • Python Privilege Escalation
        • Python Yaml Privilege Escalation
        • Remote Code Execution with YAML
        • Reverse Shell
        • Find Credentials
        • Shared Library Hijacking
        • dirty_sock (CVE-2019-7304) Version < 2.37
        • Tar Wildcard Injection PrivEsc
        • Update-Motd Privilege Escalation
        • Linux Privilege Escalation
        • Exploitation
        • doas
          • Investigation
        • sudo
          • Sudo ClamAV Privilege Escalation
          • Sudo Dstat Privilege Escalation
          • Sudo Exiftool Privilege Escalation
          • Sudo Fail2ban Privilege Escalation
          • Sudo Git Privilege Escalation
          • Sudo Java Privilege Escalation
          • Sudo OpenVPN Privilege Escalation
          • Sudo Path Traversal Privilege Escalation
          • LD_PRELOAD, LD_LIBRARY_PATH Overwriting
          • Sudo Reboot Privilege Escalation
          • Sudo Screen Privilege Escalation
          • Sudo Service Privilege Escalation
          • Sudo Shutdown, Poweroff Privilege Escalation
          • Sudo Systemctl Privilege Escalation
          • Sudo Tee Privilege Escalation
          • Sudo Umount Privilege Escalation
          • Sudo Vim Privilege Escalation
          • Sudo Wall Privilege Escalation
          • Sudo Wget Privilege Escalation
          • Sudoedit Privilege Escalation
          • Sudo Privilege Escalation
      • protocol
        • Enumeration
  • machine-learning
    • Jupyter Notebook Pentesting
    • Orange Data Mining
    • Read HDF5 (H5) File
    • Load Model from PT
    • Read QASM
    • computer-vision
      • Image Analysis for Machine Learning
      • Swapping Pixels
      • Image Recognition Bypass for Machine Learning
    • data-processing
      • Find Optimal Number of Clusters
      • Data Manipulation for Machine Learning
      • PCA (Principal Component Analysis)
    • llm
      • Automation
      • LLM Prompt Injection
    • model
      • Adversarial Attack with FGSM (Fast Gradient Signed Method)
      • ML Model Analysis
      • Model Inversion Attack
  • malware
    • Online Scanner
    • Create Macro to Code Execution
    • Static Analysis
    • Malware Analysis
    • Checking Established Network
    • Malware Dynamic Analysis with REMnux
    • Malware Static Analysis
    • NPM Supply Chain Attack
    • Example
    • Splunk Pentesting
  • memory
    • Memory Forensics
  • misc
    • Brainfuck
    • Regular Expressions (Regex/RegExp)
  • network
    • ARP (Address Resolution Protocol) Spoofing
    • Apache Hadoop Pentesting
    • Enumeration
    • FastCGI Pentesting
    • Firewall
    • Honeypots
    • Attack Flow
    • Network Traffic Analysis (NTA)
    • Networking
    • ReDoS (Regular Expression Denial of Service)
    • Rsync Pentesting
    • Tor
    • Connect with grpcui
    • attack
      • Exploitation using Metasploit
      • Anonymize Traffic with Tor
      • DoS/DDoS Attack
    • bluetooth
      • BlueBorne
    • port-forwarding
      • Port Forwarding with Chisel
      • Reverse Connection
      • Port Forwarding with SSH
      • Port Forwarding with Socat
    • protocol
      • Enumeration
      • FTP (File Transfer Protocol) Pentesting
      • Enumeration
      • Communication
      • Enumeration
      • NFS (Network File System) Pentesting
      • Enumeration
      • Enumeration
      • RTSP (Real Time Streaming Protocol) Pentesting
      • Restricted Shell (rbash, rzsh) Bypass
      • SNMP (Simple Network Management Protocol) Pentesting
      • SSH (Secure Shell) Pentesting
      • TFTP (Trivial File Transfer Protocol) Pentesting
      • Telnet Pentesting
      • Enumeration
      • VNC (Virtual Network Computing) Pentesting
      • Connect
    • tool
      • Convert PuTTY Key to OpenSSH Key
      • Tshark Cheat Sheet
      • Wireshark Cheat Sheet
    • vpn
      • Enumeration
      • OpenVPN Troubleshooting
    • wifi
      • Exploitation
      • MITM (Man in the Middle) Attack
      • WiFi Hacking
      • WiFi Password Recovery
  • penbook
    • Active Directory mapping
    • Active information gathering
    • Arp-spoofing - Sniffing traffic
    • Attacking the user
    • Automated Vulnerability Scanners
    • Bash-scripting
    • Basics
    • Basics of linux
    • Basics of windows
    • The Basics of Assembly
    • Binary exploits
    • Broken Authentication or Session Management
    • Browser vulnerabilities
    • Buffer overflow (BOF)
    • Bypass File Upload Filtering
    • Bypassing antivirus
    • physical_access_to_machine
      • rubber-ducky
    • writeups
      • NSM hack challenge
      • SANS Holiday Hack 2016 - chris
      • vulnhub
        • kioptrix 1
        • kioptrix 2 (level 1.1)
        • Quaoar - written by chris
  • printer
    • IPP (Internet Printing Protocol) Pentesting
    • Raw Printing Pentesting
  • python-pty-shells
    • LICENCE
  • reconnaissance
    • Email Analysis
  • container
    • docker
      • Docker Engine API Pentesting
      • Docker Escape
      • Docker Registry Pentesting
      • Directory Traversal & Arbitrary Command Execution (CVE-2021-41091 )
      • Docker Pentesting
    • kubernetes
      • Kubernetes Pentesting
      • MicroK8s Pentesting
  • cryptography
    • algorithm
      • AES-CBC Bit Flipping Attack
      • PadBuster
      • AES-ECB Padding Attack
      • AES (Advanced Encryption Standard)
      • Decryption
      • Online Tools
      • Base32, Base64
      • Online Tools
      • Decrypt
      • Certificates
      • DES (Data Encryption Standard)
      • Diffie-Hellman Key Exchange
      • ECC (Elliptic Curve Cryptography)
      • ECDSA in Python
      • Decrypt
      • GPG (GNU Privacy Guard)
      • HMAC
      • KDBX Files
      • Exploitation
      • MD4, MD5
      • Online Tools
      • NTLM, NTLMv2
      • Decrypt
      • PGP (Pretty Good Privacy)
      • Decrypt
      • Decrypt
      • Decrypt
      • ROT13, ROT47
      • RPNG (Pseudo Random Number Generator) Guessing
      • RSA (Rivest Shamir Adleman)
      • Sample Attacks
      • SHA1, SHA256, SHA512
  • mobile
    • android
      • Android Pentesting
      • Connect to Android Device from PC using SSH
Powered by GitBook
On this page
  • Prepare Dataset
  • Data Analysis
  • Attacks
  1. machine-learning
  2. data-processing

Data Manipulation for Machine Learning

In attack perspective for machine learning, we manipulate dataset values to unexpected ones. This may destroy the performance of ML models by inserting inappropriate (or nonsense) values. However, to

Prepare Dataset

Before manipulation, load dataset as DataFrame as Pandas.

import pandas as pd

df = pd.read_csv('example.csv', index_col=0)

Data Analysis

Before attacking, need to investigate the dataset and find the points where we can manipulate and fool models and people.

# Information
df.info()

# Dimensionality
df.shape

# Data types
df.dtypes

# Correlation of Columns
df.corr

# Histgram
df.hist()

Access Values

# The first 5 rows
df.head()
df.iloc[:5]
df.iloc[:5].values # as NumPy
# The first 10 rows
df.head(10)
df.iloc[:10]
df.iloc[:10].values # as NumPy
# The first 100 rows
df.head(100)
df.iloc[:100]
df.iloc[:100].values # as NumPy

# The last 5 rows
df.tail()
df.iloc[-5:]
df.iloc[-5:].values # as NumPy
# The last 10 rows
df.tail(10)
df.iloc[-10:]
df.iloc[-10:].values # as NumPy
# The last 100 rows
df.tail(100)
df.iloc[-100:]
df.iloc[-100:].values # as NumPy

# The first row
df.iloc[0]
df.iloc[[0]]
# The 1st and the 2nd rows
df.iloc[[0, 1]]
# From the 3rd row to the 8th row
df.iloc[2:8]

# The last row and all columns
df.iloc[-1:, :]

# All rows and first column
df.iloc[:, 0]

# Exclude the last row and all columns
df.iloc[:-1, :]
# Exclude the last column and all rows
df.iloc[:, :-1]

# Rows where 'Sex' is 'male'
df.loc[df['Sex'] == 'male']
# Rows where 'Age' is 18 or more
df.loc[df['Age'] >= 18]
# Rows where 'Name' contains 'Emily'
df.loc[df['Name'].str.contains('Emily')]
# Rows where 'Hobby' is 'Swimming' AND 'Age' is over 25
df.loc[df['Hobby'] == 'Swimming' & (df['Age'] > 25)]
# Rows where 'Hobby' is 'Swimming' AND 'Age' is over 25 AND 'Age' is NOT 30
df.loc[df['Hobby'] == 'Swimming' & (df['Age'] > 25) & ~(df['Age'] == 30)]

Attacks

After analyzing data, we're ready to attack this.

Value Overriding

Override the values to abnormal or unexpected values.

# Set 'Adult' to 0 for rows where 'Age' is 18 or higher
df.loc[df['Age'] >= 18, 'Adult'] = 0
# Set 'Adult' to 1 for rows where 'Age' is lower than 18
df.loc[df['Age'] < 18, 'Adult'] = 1

# Set 'Score' to -1 for all rows
df.iloc[:, 'Score'] = -1
# Set 'Score' to 100 for the last 10 rows
df.loc[df.index[-2:], 'Score'] = 100

# Set John's score to 0 (...attacker may have a grudge against John)
df.iloc[df['Name'] == 'John', 'Score'] = 0

# Replace unexpected values
df["Gender"] = df["Gender"].replace("male", 0)
df["Gender"] = df["Gender"].replace("female", -77)

Filling Missing (NaN) Values with Inappropriate Methods

Typically, NaN values are filled with the mean of the values. However in attack perspective, other methods can be used e.g. max() or min().

# Fill with the maximum score
df["Income"] = df["Income"].fillna(df["Income"].max())
# Fill with the minimum score
df["Income"] = df["Income"].fillna(df["Income"].min())

Another Dataset Integration

Integrating another dataset values, it may fool ML models with fake values. For example, the following fake_scores.csv contains fake scores for each person. This changes all original scores to fake scores by creating a new DataFrame which is integrated this fake dataset.

fake_scores_df = pd.read_csv('fake_scores.csv')
new_df = pd.DataFrame({ 'Name': df['Name'].values, 'Score': fake_scores_df['Score'].values })

Required Columns Removing

Remove columns which are required to train model. This is blatant and may be not useful, but write it down just in case.

# axis=1: columns
df.drop(["Age", "Score"], axis=1)
PreviousFind Optimal Number of ClustersNextPCA (Principal Component Analysis)

Last updated 1 year ago